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ABSTRACT 

O j We report weak-lensing masses for 5 1 of the most X-ray luminous galaxy clusters known. This 

cluster sample, introduced earlier in this series of papers, spans redshifts 0.15 < z c \ < 0.7, and 
is well suited to calibrate mass proxies for current cluster cosmology experiments. Cluster 
masses are measured with a standard 'color-cut' lensing method from three-filter photometry 
of each field. Additionally, for 27 cluster fields with at least five-filter photometry, we mea- 
sure high-accuracy masses using a new method that exploits all information available in the 
photometric redshift posterior probability distributions of individual galaxies. Using simula- 
tions based on the COSMOS-30 catalog, we demonstrate control of systematic biases in the 
mean mass of the sample with this method, from photometric redshift biases and associated 
uncertainties, to better than 3%. In contrast, we show that the use of single-point estimators 
in place of the full photometric redshift posterior distributions can lead to significant redshift- 
dependent biases on cluster masses. The performance of our new photometric redshift-based 
method allows us to calibrate 'color-cut' masses for all 51 clusters in the present sample to 
a total systematic uncertainty of « 7% on the mean mass, a level sufficient to significantly 
improve current cosmology constraints from galaxy clusters. Our results bode well for future 
cosmological studies of clusters, potentially reducing the need for exhaustive spectroscopic 
calibration surveys as compared to other techniques, when deep, multi-filter optical and near- 
IR imaging surveys are coupled with robust photometric redshift methods. 

Key words: galaxies: clusters: general; gravitational lensing: weak; methods: data analysis; 
methods: statistical; galaxies: distances and redshifts; cosmology: observations 



1 INTRODUCTION 

Galaxy clusters have become a cornerstone of the experimental ev- 
idence supporting the standard ACDM cosmological model. Re- 
cent studies of statistical samples of clusters have placed precise 
and robust constraints on fundamental parameters, including the 
amplitude of the matter power spectrum, the dark energy equa- 
tion of state, and departures from General Relativity on large 
scales. For a review of recent progress and future prospects, see 
lAllen. Evrard. & Mantzl d201 lb . 
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Typical galaxy cluster number count experiments require a 
mass-observable scaling relation to infer cluster masses from sur- 
vey data, which in turn requires calibration of the mass-proxy bias 
and scatter. Weak lensing follow-up of clusters can be used, and 
to some extent has already been used, to set the absolute calibra- 
tions for the mass-observable relations employed in current X-ray 
and optical cluster count surveys (e.g. iMantz et al. I l2008l l2010al: 
IVikhlinin et~aT]|2009bl : iRozo et alJ|201CI) . However, targeted weak 
lensing follow-up efforts of cluster surveys have not yet studied 
a sufficient number of clusters nor have demonstrated a sufficient 
control over systematic uncertainties to meaningfully impact on 
cosmological constraints. 

For the current generation of X-ray cluster surveys drawn from 
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ROSAT observations (e.g . Ebeling et al.l 19981 ; iBohringer et al.l 
|2004 iBurenin et all l2007t lEbeling et al]|20ld) . the uncertainty 



in the absolute mass calibration of the survey proxy, which is 
of the order ss 15%, dominates the syste matic uncertainty on 
the matter power spec trum normalization cr s {Mantz et al.ll2Qlo3 ; 
IVikhlinin etal]|2009bl) . For lMantz etail l l2010al) . the current lim- 
its on this systematic uncertainty are derived from si mulations of 
non-t hermal pressure support in relaxed clusters (e.g jNagai et al .1 
20071) and uncertainties in the Chandra calibration, whereas for 



Vikhlinin et alj J2009al) :he limits are derived from weak lensing 
calibrations " faoekstrd2007l ; IZhang et al.l2008l) . quoted as a 9% un- 
certainty but neglecting an additional sy stematic uncertainty o n the 
lensing masses known to be at least 10% ( Mahdavi et al .120081) . The 
absolute mass calibration from weak lensing follow-up therefore 
needs to be accurate to better than 15% to impact significantly on 
current work. Future surveys will face even more stringent system- 
atics requirements on the absolute calibration of multiwavelength 
mass proxies if they are to utilize fully their statistical potential. For 
example, the Dark Energy Survey will require an absolute mass cal- 
ibration at the 5% level for the dark energy co nstraints to be w ithin 
10% of their maximum potential sensitivity JWu et al]|2010l) , re- 
quiring a combination of weak lensing and high-precision mass 
proxies, i.e. X-ray observations. Similar arguments apply to clus- 
ter surveys acro ss the electromagnetic sp ectrum, e.g. t he South Pole 
Teles cope (SPT. IWilliamson et alj201 If) and eRosita l lPredehl et al .1 

[2oToh . 

To achieve such calibration with weak lensing, one needs 
to follow up a large sample of clusters. For individual clusters 
weak lensi ng typically offers mass measurements with a precision 
of « 30% teecker & Kravtsovll201 it lOkabe et alj|2010fc iHoekstr j 
|2007|) , driven approximately equally by a limited number of well 
measured galaxies and line of sight structure. However, simula- 
tions show that weak-lensing measurements can in principle pro- 
vide accurate, approximately unbiased, e stimates of the mean mass 
for statistical samples of galaxy clusters l lBecker &Kravtsovl201lt 
ICorless & Kingll2007l) . Small systematic biases in the mean mass 
can still arise from, e.g., the details of the assumed mass model, 
shear calibration, and the lensed-galaxy redshift distribution. Such 
sources of uncertainty, in particular the lensed-galaxy redshift dis- 
tribution, have not yet been sufficiently understood for upcoming, 
or even current, surveys, as we show in this work. 

In the Weighing the Giants project, we aim to provide absolute 
mass-calibration for galaxy cluster mass proxies, including specif- 
ically X-ray mass proxies, to better than 10% accuracy. We have 
gathered extensive optical imaging of 51 clusters in at least three 
wide photometric filters, where clusters are m ostly drawn from the 
X-ray selected cosmological cluster s ample of Mantz et al. (2010a) 
and the relaxed cluster sample from I Allen et af 1 20081) . Of these, 
27 were observed in at least five filters. The clusters span a redshift 
range of 0. 15 < z < 0.7. To ensure an accurate mass-calibration, we 
have pursued a 'blind' analysis where we have deliberately delayed 
comparing our lensing masses to X-ray masses and the lensing 
masses of others reported in the literature. Such a simple procedure 
prevents us from introducing observer's bias into our results. Given 
the redshift range, data quality, filter coverage, and blind analysis, 
our study represents the most extensive analysis of its type to date, 
and should be considered a pathfinder for the challenges facing up- 
coming optical, submillimetre, and X-ray cluster surveys. 

Here, we report weak-lensing masses for the 51 clusters in 
our sample, and show that the total systematic uncertainty on the 
mean mass of the sample is controlled to ~7%. In particular, we 
focus on controlling systematic uncertainties associated with the 



redshift distribution of lensed galaxies. We approach this problem 
in two ways. For the entire sample, we employ a sta ndard analy- 
sis technique (the "color-cut" method; Hoekstra 2007), albeit with 
some improvements, where the lensed redshift distribution for each 
cluster field is estimated from separate, deep field photometric red- 
shift (photo-z ) measurements. We show that this method alone does 
not sufficiently control systematic uncertainties to the accuracy re- 
quired for current surveys. Alternatively, the redshift distribution of 
background galaxies may be measured using photometric redshift 
estimates in fields with at leas t five filter coverage. While previ- 
ous large photometric surveys JWolf et alj|2004l : lllbert et al.ll2009f) 
have shown that high fidelity photometric redshift point estimators 
are possible through the use of many (e.g., greater than 15) broad, 
medium, and narrow band filters for objects down to t < 25 magni- 
tude, observations of cluster fields usually lack coverage with such 
a comprehensive array of photometric filters and future optical sur- 
veys will typically have only six broad filters. We show that with 
such limited photometric coverage, photo-z point estimates are in- 
sufficient to recover unbiased cluster masses. We therefore develop 
a method that uses the full photo-z posterior probability distribu- 
tion P (z) for individual galaxies in each cluster field, referred to 
as the "P (z)" method, and show that it can be used to measure ro- 
bust clust er weak-lensing m asses. Using the COSMOS-30 photo- 
z catalog ( lllbert et alj[2009T) , we create a series of simulations to 
test the sensitivity of our reconstructed masses to photo-z errors. 
We show that P (z) distributions from current photometric redshift 
codes, with B s V s R c IcZ + photometry, enable control of systematic 
uncertainties on the mean mass for the sample to better than 2% 
accuracy for clusters at 0.15 < z < 0.7 - a result that provides 
significant encouragement for future cluster-cosmology work. 

This is the third in a series of papers describing the project. Pa- 
per I describes our cl uster sample, data reduction procedures, and 
shear measurements l lvon der Linden et al.l|2012|) . Paper II details 
our photometric redshift measurements, including the development 
of a scattered-light correction for SuprimeCam, and an improved 
relative p hotometric calibra tion procedure based on fitting the stel- 
lar locus felly et al.ll2012l) . This paper reports our lensing masses 
and estimates of systematic uncertainties in the sample mean mass. 
Forthcoming papers will use these accurate cluster masses to cal- 
ibrate X-ray mass proxies and determine improved cosmological 
constraints. 

The structure of this paper is as follows. In Section [2] we re- 
view cluster mass measurements with weak lensing. We describe 
our data set and analysis procedures in Section [3] In Section [4] 
we develop and apply our implementation of the color-cut method 
to all clusters in the sample. In Sections [5] & [6] we introduce 
our photo-z lensing framework that incorporates photo-z posterior 
probability distributions for each galaxy observed. In Section [7] 
we investigate the expected systematic errors present in mass mea- 
surements given the empirical performance of photo-z estimators. 
In Section [8] we report measured masses using both the P(z) and 
color-cut methods, and cross-calibrate the color-cut results. In Sec- 
tion [9] we perform checks of other potential systematic uncertain- 
ties. We compare our lensing mass measurements to other efforts 
in the literature, based on overlapping samples, in Section[l0] and 
we provide concluding remarks in SectionfTTI 

Unless otherwise noted, all mass measurements assume a flat 
ACDM reference cosmology with Cl m = 0.3, Cl\ = 0.7 and H = 
100Akm/s/Mpc, where h = 0.7. 
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Figure 1. The reduced shear g, as a function of source galaxy redshift z, 
for cluster lenses at redshifts z G i us ter =0.2 and 0.5. The function /?,, a ratio 
of angular diameter distances, controls the shape of the curve (see Eq.|2j. 
f} s is zero for sources at redshifts less than z c i U ster and rises steeply above 
Zclusten eventually flattening off at high redshift. The shape of the function 
is cosmology dependent. A typical galaxy redshift distribution for a typical 
ground-based i* < 25 mag survey is shown in light gray, peaking at z ~ 0.8. 



2 WEAK-LENSING MASS MEASUREMENTS 

The mass of a gravitational lens, in this case a massive galaxy 
cluster, may be inferred from the systematic distortion of images 
of background galaxies as measured by the reduced s hear. F or a 
review of weak lensing, see iBartelmann & Schneider! l l200lh and 
ISchneideJ JioOoj) . Here, we review the redshift dependence of the 
reduced shear and how it relates to the cluster mass profile. 

The ellipticity of a galaxy, corrected for PSF effects, provides 
a noisy estimate of the reduced shear at the galaxy position. Assum- 
ing a single lens plane, the theoretical expectation for the reduced 
shear g ($J is given by 
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where the shear y m and convergence /e„ are set by the mass 

distribution of the lens, evaluated at the source position 9, assuming 
a lensed source at infinite redshift. For an axisymmetric lens, Eq.Q] 
reduces to a scalar equation, as the only shear will be tangential 
to the lens. The dependence of the distortion on the background- 
galaxy redshift is set by f} s (zbY 



D s D, 



(2) 



f} s is a ratio of angular diameter distances, where D L s is the dis- 
tance between the lens and the source, Ds is the distance to the 
source, and D L „ and are the corresponding distances from the 
lens and the observer to a source at infinite redshift, respectively. 
Figure Q] shows how the reduced shear g scales as a function of 
background-galaxy redshift for lenses at two redshifts. For refer- 
ence, a typical analytical approximation to a ground-based i + < 25 
magnitude reds hift distribution, peakin g at z ~ 0.8, is shown as the 
shaded region dSchrabback et alj |20icj). f} s rises rapidly from zero 
for redshifts just beyond a lens, and approaches a constant value at 
high redshift. 

To facilitate comparisons to other mass proxies, especially X- 
ray proxies, we measure the total mass enclosed within a sphere of 
fixed radius. While a general 2D mass distribution can in principle 



be recovered dBradac et alj|2005l) . this approach would be limited 
by the depth of our images and an inability to break the mass s heet 
degeneracy from weak-lensing data alone (Br adac e t alJ2 004h , es- 
pecially for low redshift clusters that fill the SuprimeCam field 
of view. Another alternative is to measure the mass within a 2D 
aperture, which determines the total projected mass within a cylin- 
der. Operationally, aperture mass measurements would require us 
to deproject an ill-constrained, noisy, 2D mass profile to make the 
needed comparison to X-ray mass measurements, and requires an 
assumed profile at large radius to break the mass sheet degeneracy. 

We instead fit the estimated reduced shear at each galaxy po- 
sition to the lensing signal predicted by a spherical Navarro-Frenk- 
White (NFW) halo dNavarro et al.ll 19971) profile. The parametrized 
mass profile, known to be a reasonable description of dark matter 
halos, automatically breaks the mass-sheet degeneracy. The NFW 
profile has two free parameters, the scale radius r s and the concen- 
tration c'2oo = rjmlfs (where overdensity is defined with respect 
to the critical density), or alternatively the mass within a particular 
radius. We implement the detailed radial, lens redshift, and cosmol- 
ogy dependence of y^ and k„ for a spherical NFW profile found in 
IWright & Brainerdl d2000h . Extensive simulation work in the liter- 
ature shows that fitting such a profile to the reduced shear, aver- 
aged over a sample of clusters, can in principle return an unbiased 
mass, depending on details in the analysis. Triaxiality, nearby cor- 
related structure, and uncorrelated structure along the line of sight 
contribute 20-25% scatter to individual mass measurements, not in- 
cluding the statistical uncertainty due to the finite number of lensed 



sources dHoekstrall2003 


;ICorless & Kindl2007l;lBecker & Kravtsovl 


l201ll:lBahe etal.l201ll: 


Hoekstra et al.l201ll). Efforts are underwav 



to verify this result for the mass range spanned by clusters in our 
sample (M S0Q > 10 15 M G ). 



3 DATA & PROCESSING 

In this section, we describe the data set, data processing, and sam- 
ple selection used as input to the mass measurement algorithms. 
We analyze a sample of 51 X- ray selected, luminou s galaxy clus- 
ters imaged with SuprimeCam dMivazaki et al.l2002T) at the Subaru 
Telescope and Megaprime at the Canada-France-Hawaii Telescope. 
Paper I contains a detailed description of the clusters observed, fil- 
ters, and processing details. All clusters in the sample were imaged 
with at least three broad optical filters, and 27 were imaged with at 
least five broad optical filters. Raw CCD ex posures were proc essed 
using a modified GaBoDS/Theli pipeline dErben et"ai]|2005h . We 
detect objects using SExtractor dBertin & Arnoutsf 199(. ). Shape 
meas urements were made with the co de analyseldac (jErben et al.l 
l200lh . based on the KSB algorithm dKaiser et alj|l995l) , to pro- 
duce shear catalogs. Shape measuremen ts were calibrated using the 
STEP2 simulations dMassev et al.l2007h . 

The heterogeneous nature of our dataset requires us to adopt 
two different strategies to measure the redshift distributions of 
galaxies in each cluster field. For the 27 cluster fields where we 
have five or more filters, we compute photometric redshifts for each 
galaxy in our shear catalogs. Photometric redshifts require strict 
control of the relative photometric calibration between filters. Paper 
II describes the position-dependent zeropoint corrections (a "star 
flat") and a custom implementation of the stellar locus regression 
technique that we use to calibrate co lors. For photo-z calculations, 
we use the BPZ co de dBemtezlPioOol) with templates optimized by 
ICapak etal]d2007t) . Paper II includes detailed quality checks on the 
photo-z calculations. For the remaining 24 cluster fields where we 
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Figure 2. The number of background galaxies for the MACSJ0417.5-1 154 
field(as a function of Rq magnitude), after lensing quality cuts. The black 
histogram shows the distribution of unsaturated objects that are detected 
in an image region with an exposure weight of at least half of the maxi- 
mum value. Because the initial object detection is highly complete, most of 
these detections are smaller than the minimum size requirement for lensing 
{fh > 1.15rJ); the magenta histogram shows objects that survive this size 
cut. Our minimum signal-to-noise requirement (S IN > 3) removes further 
objects at the faint end (blue histogram). We furthermore reject objects with 
exceptionally large shear estimates, small values of Pf , or a small KSB filter 
size; these cuts remove only a few objects at the faint end (green histogram). 
To ensure robust photometric redshift estimates, we remove objects with 
i + > 25, which is the COSMOS-30 completeness limit (red histogram). 
These cuts (and those applied in Fig. [3} are applied to the catalogs used 
for both the P(z) and the color-cut methods. For the color-cut method, we 
estimate the detection completeness magnitude from this histogram, shown 
here as the vertical red line. 



have observations in less tha n five filters, we u se the COSMOS-30 
photometric redshift catalog dllbert et alj2009l) as a reference deep 
field for a traditional "color-cut" analysis. 

We apply a series of cuts to the galaxy catalogs, based on the 
shape and photometry measurements, to minimize bias while maxi- 
mizing sensitivity. We do not use measurements based on tangential 
shear, measured mass of the cluster, or X-ray derived mass when es- 
tablishing these cuts. The color-cut method and the P (z) method, 
use a similar set of cuts which are described below. Differences, 
where they exist, are noted explicitly. In the color-cut analysis, 
we keep between 1500 to 15,000 galaxies per cluster field. For 
the photo-z based analysis, roughly 500 to 5,000 galaxies remain, 
where the main difference is a redshift cut that only selects galaxies 
behind the cluster. Figure[2]illustrates how some of the major cuts, 
detailed below, affect the object number counts in an example field. 

Lensing quality cuts: For the lensing analysis we require that an 
object ellipticity is measured with S IN > 3, as defined by anal- 
yseldac. We also require that the objects are 15% larger than the 
point spread function (PSF) of the observation as measured by the 
median half-light radius r h reported by analyseldac) of stars in the 
image (n, > 1.15rjj; see Paper 1 and Section [6] for motivation of 



these cuts). These criteria remove a large fraction of the initially 
detected objects (Fig. [2}. In addition, we guard against failures in 
the shape measurement code by accepting only galaxies with a min- 
imum KSB filter size r g > 1.5 pixels, measured shear |g| < 1.4, and 
shear susceptibility P g > 0.1. We do not explicitly cut on objects 
that are close to one another (a "nearest neighbor" cut); we have 
verified that the explicit removal of objects who have nearest com- 
panions within a radius 3r g does not induce a systematic shift in our 
mass measurements. We also remove large objects from the cata- 
log, as these objects are unlikely to be background galaxies and the 
success rate of analyseldac drops for large objects, mostly because 
the centroid may vary with isophote radius. We choose the upper 
galaxy size limit to be the r g radius where the success rate drops be- 
low 75%. This removes (2-10)% of the objects that otherwise pass 
all lensing criteria, with higher rates at the cluster center. 

Bright magnitude cut: The brightest galaxies in each field are un- 
likely to lie behind the cluster. We therefore remove galaxies with 
a magnitude brighter than 22 in the detection band. 

COSMOS completeness limit cut: The COS MOS-30 photomet - 
ric catalog is publicly available to i + < 25.0 dllbert et alj[2009h . 
For objects fainter than this, t he uncertainty in the COSMOS-30 
photometric redshifts is large dllbert et all 120091). and the outlier 
fraction may be significant dSchrabback et alj2010h . Therefore, we 
limit our catalogs to the same depth, even when our data are sub- 
stantially deeper. For fields not observed in ;' + , we interpolate the i* 
magnitude from the best-fit BPZ template (for all cluster fields, re- 
gardless of filter coverage). This faint cut applies to both the color- 
cut analysis and the photo-z based analysis, as we use COSMOS-30 
to verify the performance of photo-z measurements. 

Red sequence cut: The measured ellipticities of galaxies lying on 
the cluster red sequence are not sensitive to the mass of the cluster, 
and will dilute the average measured shear if not removed. The use 
of a color-magnitude diagram (CMD) is an efficient way to iden- 
tify and remove these galaxies, independent of photo-z 's. How- 
ever, a single color is not a monotonic function of redshift, and so 
a generic red sequence band on a CMD can contain non-cluster 
galaxies. This is particularly true at faint magnitudes, where the 
majority of our lensing sample lies. Furthermore, with increasing 
cluster redshift , the faint end of the c luster red sequence is not well 
populated (e.g. lDe Lucia et al .120071) . We identify the red sequence 
simultaneously in two color-magnitude diagrams (Fig. [3)- Galaxies 
are only removed from the catalog if they lie on the red sequence 
in both diagrams. 

Radial distance cut: The NFW halo model does not provide an 
adequate description of the mass distribution beyond t he virial ra- 
dius of galaxy clusters (e.g., lBecker & KravtsovluOl ll) . Also, near 
the cluster center, we expect increased cluster galaxy contami- 
nation, and shears departing from the weak-lensing approxima- 
tion. In addition, we are only able to calibrate our shear measure- 
ments over the narrow ran ge of shears probed by the STEP2 pro- 
gram dMassev et al.l 12007b . Therefore, we accept galaxies within 
a projected range 750kpc < r < 3Mpc, which is approximately 
equivalent to the X-r ay measured 0.5r5o o < r < 2.O7500 for 
these massive clusters dMantz et al.ll2010bT) . By removing the cen- 
ters of clusters, we are also less sensitive to profile miscentering 
dMandelbaum et al]|2010l ; lvon der Linden et al.l2012l) . 
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Figure 3. (Vj - Rc) vs. Rc (top panel) and (Rc - lc) vs. Rc (bottom 
panel) color-magnitude diagrams for the MACSJ0417.5-1 154 field. The 
band within which we select the red sequence is shown by the blue lines. 
Galaxies are classified as being in the red sequence if they fall within the 
band in both diagrams (shown as red symbols). These galaxies are excluded 
from both the P (z) and color-cut methods. The green line shows the com- 
pleteness limit of the lensing band (see Fig. [2). The black lines illustrate the 
completeness limits of the other two filters in these diagrams. 



although the details of the implementations differ in some regards 
from what we present here. 



4.1 Defining and Applying Color Cuts 

For our reference deep field, we use the COSMOS-30 photometry 
catalog with photome tric redshifts dete rmined from all available 
bands, as described in lllbert et all J2009h . Although the COSMOS 
field is a statistically limited sample with respect to cosmic variance 
effects, it is the best-suited reference field for our study because of 
the depth to which photometric redshifts are complete (i + < 25), 
and the overlap in filter coverage and data quality with our obser- 
vations. 

To avoid a statistical mismatch between the galaxy popula- 
tions selected fro m our cluster fields and the sample presented in 
lllbert et all ( 120091) . we must first apply all of the cuts from Section[3] 
to the COSMOS-30 catalog. Below, we describe additional pho- 
tometry cuts that are also required for the color-cut analysis, and 
proxies for cuts based on measurements from analyseldac, which 
are not available for the COSMOS field. 

Completeness cut in the detection band: To emulate the effects 
of the analyseldac S/N > 3 cut, we reject all galaxies in both 
our catalogs and the COSMOS-30 catalog fainter than the limiting 
magnitude in our detection band for a given cluster. The limiting 
magnitude of the galaxy sample in a cluster field is defined as the 
magnitude where the number counts turn over (see Fig [2}. We de- 
fine the limiting magnitude after the i + < 25 cut has already been 
applied. 



Photo-z cuts: For the P(z) analysis only, we make cuts on the 
measured photo-z 's. We remove galaxies with a BPZ single point 
redshift estimate zt > 125. This is due to degraded performance in 
the photo-z 's as the 4000A break moves out of the z + band. We also 
remove galaxies with zz, < Zcimter + 0.1 to remove foreground and 
cluster contamination. Finally, we only include objects where the 
difference between the 2.5 and 97.5 redshift percentiles, measured 
by the P (z) , is less than 2.5. This excludes objects with effectively 
no redshift constraint, for which P{z) is dominated by the BPZ 
prior. Note that we do not enforce a cut on the BPZ ODDS param- 
eter, which is itself a measure of the redshift posterior probability 
concentration around the most likely value. We find no statistically 
significant systematic shift in our measured masses when we en- 
force such a cut at ODDS > 0.7 or at ODDS > 0.9. 



4 LENSING MASSES WITH THE "COLOR-CUT" 
METHOD 

In this section, we present a traditional "color-cut" analysis, em- 
ploying three-filter observations, for all clusters in the sample. In 
the color-cut method, the average redshift of the lensed galaxy pop- 
ulation is measured from a statistically matched subset of galaxies 
in a reference deep field where spectroscopic or high-quality pho- 
tometric redshifts are available. The color-cut method has the ad- 
vantage that it can be applied to fields with a modest filter coverage. 
However, the relative lack of color information leads to a shear di- 
lution from cluster galaxies, for which we need to correct. 

Co lor-cut methods have been used extensively in previou s 
studies jHoekstrall2007l ; lOkabe et al.ll2010l : iHoekstra et all 1201 lb . 



Completeness in colors: Because we use three filters for the iden- 
tification and removal of red sequence galaxies, we must match the 
detection limits in these filters with the COSMOS catalog (Fig.[3jl. 
Since there are no lensing quality constraints on these filters from 
our observations, the completeness limit is typically considerably 
deeper than the lensing limit, and this cut removes only a few ob- 
jects. 

Size cut proxy: We emulate the effects of the size cut employed 
in each cluster field by determining the median SExtractor half- 
light radius of the objects flagged as stars in the COSMOS Sub- 
aru photometry, and rejecting objects with FLUX_RADIUS < 
1.15 FLUX_RADIUS*. There is significant scatter between the 
analyseldac measured galaxy half-light radii and the one reported 
in the COSMOS-30 catalog (FLUX_RADIUS, measured with SEx- 
tractor). The inability to match the size cut precisely is a source 
for systematic error in the mean mass of the galaxy cluster sample. 
We return to this issue in Section l4~4l 



4.2 Contamination Correction 

The remaining cluster field catalogs still contain some cluster 
galaxies, predominantly galaxies bluer than the red sequence. 
These galaxies dilute the lensing signal because they are not lensed 
by the cluster, and are not accounted for in the redshift distribu- 
tion from the C OSMOS-30 deep field. We follow the method of 
iHoekstral J2007h to estimate the fraction of contaminating cluster 
galaxies by examining the number density profile of objects in the 
lensing catalog. The assumption here is that the number density of 
background (and foreground) objects not associated with the cluster 
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is uniform across the field. The number density of cluster-member 
galaxies, on the other hand, increases towards the cluster center. 

For this measurement, one has to be careful to take into ac- 
count effects that mimic a decrease/increase in the number density 
of measured objects as a function of cluster radius, which are at 
least as large as the density effects induced by cluster magnifica- 
tion. For instance, cluster galaxies obscure a fraction of the back- 
ground galaxies, with the fraction increasing towards the cluster 
center. Left unaccounted for, this would lead to an underestimate 
of the cluster galaxy contamination, or even an apparent depletion 
in the number density of background objects. When deriving the 
number density profiles, it is therefore essential to track the ar- 
eas masked by image artifacts and objects rejected from the back- 
ground catalog. (For each of these objects, the masked area is taken 
as the ISOAREAJMAGE SExtractor output parameter.) The area 
masked by other objects, mostly bright objects and red sequence 
galaxies (Fig. |4) is typically < 5% at large radii, but ~ 10% at 
0.5^5oo- 

The 'lensing quality cuts' have an additional effect on the 
number density profiles of background objects, in that objects with 
close neighbors are less likely to have shape measurements of ac- 
ceptable quality. This lowers the observed number of background 
galaxies in the lensing sample near the clear center, below the den- 
sity extrapolated from the cluster outskirts. In order to account for 
this, we derive the contamination correction from galaxy catalogs 
prior to applying the lensing cuts. However, the galaxy catalog from 
which the contamination correction cut is determined should have 
statistical properties as close as possible, in terms of brightness, 
color, and size distribution, to the final lensing catalog, so that the 
contamination fraction is the same for both - analogous to the need 
for proxy cuts for the COSMOS-30 catalog in the previous section. 
Here, however, we have considerably more information about each 
object, and can refine the proxy cuts developed previously to be 
more accurate. 

As before, the effects of the S/N cut can be mimicked by ap- 
plying a limiting magnitude cut. The size cut is mimicked as fol- 
lows. For those objects with r h < 1.15r?, we determine the 33% 
percentile in FLUX_R ADIUS , FWHMJMAGE, and major (A) and 
minor (B) axis lengths. We then remove objects in the full catalog 
which are smaller than the 33% percentile in any of these four vari- 
ables. In addition, we remove objects with CLAS S JiTAR > 0.99. 
These proxies remove (50-60)% of the objects caught by the r h < 
1.15^* cut; however, only ~ 2% of objects with 77, > 1 . 1 5 r ; * are 
removed. 

Although the expected increase in galaxy number density to- 
wards the cluster core due to contamination is evident in most 
fields, the number counts are generally too noisy to reliably esti- 
mate the contamination fraction in each cluster independently, i.e. 
the counts are affect ed by correlated structures in the field. We fol- 
low |Hoekstri |20o3) and determine an average contamination frac- 
tion for the cluster sample. Unlike lHoekstral l l2007l) however, we do 
not estimate the background number density simply from the outer 
annuli. Nearby clusters will fill most of the SuprimeCam field, pre- 
venting reliable measurements of the background population den- 
sity. Instead, we fit the number density profile of each cluster with 
a function of the form 

m = JWrM =/ 500 e'-'^x . (3) 

Cluster v J ^background 

All clusters are fitted simultaneously. The fractional contami- 
nation at r 500X , /soo, is linked across clusters, whereas ^background is 
free for each field. We assume Gaussian errors and fit the model 



with^ 2 minimization. Since the cluster core is not fit in the lensing 
analysis, we restrict the fit to the projected radius R > 0.3Rsoo.x- 
For this purpose on ly, we use a provisional value for r 50 o,x from 
iMantz et al .1 d2010al) . By scaling with i?5oo,x, we account for the 
mass range of the clusters. The noise induced by cosmic variance 
from field to field prevent us from fitting any functional dependence 
of /500 (e-g-, redshift, (scaled) cluster mass, observing filter, limit- 
ing magnitude). 

The best-fit contamination fraction for the full cluster sample 
is /500 = (8.6 ± 0.9)%, where the uncertainties quoted here and 
below are based on bootstrapping the cluster sample. Because of 
the presence of systematic, unmodeled scatter (since for individual 
clusters, the adopted model is not necessarily a good description), 
none of the fits are formally acceptable. Restricting the fit to the 
cosmology sample alone, the best-fit fraction is f^ m ° = (7.9 ± 
1.2)%. We have chosen to fit an exponential, rather than 1 jr profile 
- the former provides a slightly better fit than the latterQ 

To test for a possible dependence of /500 on cluster redshift, 
we split the sample in half at z = 0.38. The best-fit contamination 
fractions are then = (7.3 ± 1.3)% and f~^ 3S = (9.9 ± 1.2)%. 

To estimate the significance of this dependence, we repeatedly split 
the sample into two random, equally-sized sets and measure the 
difference in /500, A/soo- In 17% of the samples, the observed A/500 
is larger than when the sample is split into low- and high-redshift 
halves. Another way of evaluating the significance of a redshift de- 
pendence is to fit /500 for each cluster individually and test the cor- 
relation with redshift. We bootstrap the sample and measure the 
Pearson, Spearman, and Kendall correlation coefficients. The prob- 
ability to find a correlation coefficient randomly greater than zero 
is ~ 83% for all three correlation measures. We conclude that /joo 
does not have a significant dependence with redshift, to the limits 
of our data. 

As described above, the cluster galaxy contamination fractions 
are estimated from catalogs that approximate the final lensing sam- 
ple without applying criteria depending on the "success rate" of the 
shape measurement algorithm (as this depends on local density and 
thus distance from the cluster center). For comparison, the contam- 
ination fraction inferred directly from the final lensing catalogs is 
/500 = (4-8 ± 1.6)%. This serves as a lower bound to the true con- 
tamination fraction. 



4.3 Color-Cut Mass Estimates 

Mass measurements with the color cut method use bootstrapped 
galaxy samples drawn from the individual cluster fields. For each 
bootstrap realization, we calculate the weighted average tangential 
shear (g,>, in radial bins i spanning the range 750kpc to 3.0Mpc. 
These bins are chosen to contain an approximately equal number of 
galaxies, with at least 300 galaxies in each bin. For fields with less 
than 1800 galaxies available, we fix the number of bins to six, again 
with an equal number of galaxies in each. Only two clusters have 
fewer than 1800 galaxies available. Each galaxy is weighted by the 
inverse of the variance of the distribution p(g\g) for the galaxy's 
S/N from analyseldac (which is otherwise marginalized over in 
the P(z) method, Section[6]l. 

1 A projected 1/r profile implies a 3D 1/r 2 profile. Although the total num- 
ber of galaxies roughly follows a 1/r 2 profile, the fraction of those galaxies 
that are not on the red sequence declines sharply towards the cluster core 
(e.g. Ivon der Linden et al. 2010). The exponential profile is shallower and 
provides a better match to expectations. 
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Figure 4. Left: The number density profile of objects in the MACSJ04 17.5-1 154 field as a function of distance from the cluster, for several lensing catalog 
selection criteria. The black filled points show the number density of objects with 22 < m# c < 24.8, the completeness limit of the lensing catalog in this field 
(see Fig. [2). For the green points, objects smaller than the PSF size are removed using only criteria based on SExtractor output parameters - note how this 
removes a roughly constant number density of objects. The red points show the number counts after applying the red sequence cut; this predominantly removes 
objects close to the cluster center. For the blue points, objects with large half-light radii are removed, since these are likely to be foreground or cluster galaxies, 
and the shape measurement is often compromised, e.g., by centroid shifts. The blue number density profile is the basis of the contamination correction, as all 
the previous selection criteria are largely independent of local galaxy density. The black open points show the number density of objects for which the shape 
measurement is robust. Because this can be compromised by close neighbors, the shape measurement fails more frequently for objects near the cluster center. 
The orange points show the number density profile after the lensing cuts (S/N > 3, //, > 1.15^*) have been applied - note how this profile is much flatter 
than the one based only on SExtractor criteria (blue), underestimating the contamination of cluster galaxies. The lines at the bottom of the figure indicate for 
each number density profile the fraction of pixels obscured by masks or bright objects excluded from the sample. The number densities have been corrected 
for this obscuration. Right: The number density profiles used for the contamination correction (blue points) for three clusters, along with the best-fit joint 
contamination profile (red line). The fraction of cluster galaxies is constrained to be the same at rsoo,X across the sample (indicated by the vertical dotted lines), 
whereas the background number density (shown as green horizontal line) is a free parameter for each cluster. 



The measured tangential shear at radius r, of the ith bin is then 
corrected for cluster galaxy contamination according to Eq.[3] 



{g,)i 



{g,)i 



(g,)i 



(4) 



1 -/5oo£ <1 ~ r ' /r500 " v) 

At this stage we also incorporate the statistical uncertainty on the 
contamination correction by sampling / 30 o from its posterior distri- 
bution for each bootstrap realization. 

The corrected tangential shear is azimuthally averaged in each 
radial bin. The average tangential shear measures the quantity (sim- 
ilar to Eq.[TJ 



<fc(r)> 



(5) 



1 -j8 t *»(r) 

Without individual redshift estimates, Eq. [5] cannot be computed 
However, with knowledge of the expected galaxy redshift distribu- 
tion for the cluster field, the right-ha nd side of Eq.|5]can be approx- 
imated by dSeitz & Schneider!! 19971^1 



model 
°t,i 



<&>y, m ° del (r,) 

1 - §TKT is, (r,) 



(6) 



In the color-cut method, (/?,) and (fi 2 s ), are calculated from the red- 
shifts of the galaxies in the reference field (in our case the COS- 
MOS field) and assumed to be the same in the cluster fields. For 
each bootstrap realization we draw a random pair of (f} s ) and {fi) 
values measured on the COSMOS field in an annulus of the same 
angular size as the radial fit range for each cluster (Sect. |4). This 
partially accounts for sample variance associated with the limited 
area entering the fit, but can only be a lower limit due to the limited 
size of the COSMOS field. 

To find the best-fit mass, we minimize for each bootstrap real- 
ization 

(<£,>, -sr dc v s )) 2 



(7) 



2 Note that there is a typo in Eq. (4.14) of Sertz & Schneider] Jl997t) . sug- 
gesting that the correction factor is instead of . iHoekstra et alj 
advocate a similar, though not identical, approximation also based 
on - we find that the lHoekstra et al.l approximation biases the masses 
at z > 0.5 by ~ -4%, using the simulations described in Section [JJ We 



measure a bias of 0.0070±0.0022 when using Eq.|6] which is not redshift 
dependent. While this small bias may be mitigated even further by includ- 
ing h igher order moments of the z distribution in Eq.l6l fSeitz & Schneiderl 
1 19971) . other sources of systematic uncertainty remain that are more difficult 
to characterize. 
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with respect to the scale radius r, of the NFW profile, keeping the 
concentration fixed at c 2 oo = 4. The parameter crj is the variance of 
the weighted mean shear in each bin. From the best-fit profile, we 
calculate the mass within 1.5 Mpc. 

The distributions of bootstrapped masses for individual clus- 
ters are very close to Gaussian. We quote the medians of these dis- 
tributions as our 'best-fit' masses from the color-cut method, with 
the 16% and 84% percentile limits as the upper and lower error 
bars. The statistical uncertainties on the mass estimates are entirely 
dominated by shot noise from the galaxy ellipticities; incorporating 
the statistical uncertainties on (J}) s , and /500 as we have done 
here only marginally increases the error budget. 



4.4 Systematic Uncertainties on the Mean Sample Mass in 
the Color-Cut Method 

The "color-cut" analysis has systematic uncertainties associated 
with calculating </?,) and the contamination correction, in addition 
to the uncertainties from shear estimation and the assumed mass 
model (see Section |9). The overall level of systematic uncertainty 
in the color-cut method can be difficult to quantify, and has gener- 
ally not been quantified in previous lensing efforts. 

Cosmic Variance: Cluster masses are roughly linearly propor- 
tional to (y8j), measured from a common deep field. Deep fields 
that are unrepresentative of the average cluster field will lead to a 
bias in the average cluster mass. (For studies of individual clus- 
ters, any deep field w ill always lead to a biased mass.) In practice, 



Mahdavi ct al 



(200 



reported a ~ 10% shift in their cluster masses 
from lHoekstral a20071) . based on the sa me data, simply d ue to using 
the larger CFHT Deep L egacy Survey jllbert et al.l2006t) in place of 
the Hubble Deep Fields dFernandez-Soto et alj 19991) as their refer- 
ence. In principle, the systematic uncertainty expected on using the 
2-sq. degree COSMOS field may be estimated from simulation^. 

Galaxy Sample: The selection criteria for lensed galaxies must be 
accurately matched to the reference deep field catalog to ensure an 
unbiased (J} s ) estimate. As an illustrative example,we consider a 
galaxy size cut where we only accept objects with sizes 15% larger 
than the PSF size to minimize stellar contamination. To test how 
well applying an equivalent cut with respect to the COSMOS PSF 
size recovers (/?,), we examined a cluster field, with photometric 
redshifts available, that was observed twice, with both 0.4 and 0.6 
arcsecond seeing. The (J} s ) values estimated from the two samples, 
after cutting with respect to the different PSF sizes but using iden- 
tical detection and photo-z catalogs, differ by up to 5%, depending 
on the redshift of the cluster. The seeing for the COSMOS field, at 
0.7 arcseconds, is larger than our average seeing of 0.6 arcseconds, 
which could impart up to a 3% systematic bias for the sample. We 
have not studied how other analysis procedures (e.g., filter transfor- 
mations or lensing cut approximations) may impact (J} s ) estimation. 

Contamination Correction: The contamination correction results 
in an a 6% correction to each cluster mass using the color-cut 
method. As demonstrated previously, this correction is sensitive 
to details in the derivation, such as accounting for foreground and 
cluster galaxies, masking background galaxies, and the assumed 



contamination profile. The correction, as implemented, does not ac- 
count for "sheets" of cluster galaxies that could exist in filaments 
and pancakes extending from the cluster to the edge of the field 
of view. Currently, no publicly available image simulations exist to 
test the accuracy of the contamination correction procedure. Such 
simulations will be challenging to perform robustly, required a de- 
tailed understanding of galaxy evolution and the impact of galaxy- 
cluster interactions. 

Without rigorous quantification of these and other systematic 
uncertainties, or without an external calibration from a method with 
quantified systematic uncertainties, color-cut style weak-lensing 
masses have limited value for calibrating other cluster mass prox- 
ies. The effects discussed here can easily shift the mean cluster 
mass by 5-10%. While the systematic effects highlighted in this 
section could in principle be modeled given sufficient computer and 
manpower, no effort currently exists, to our knowledge. 



5 LENSING MASSES WITH PHOTOMETRIC REDSHIFT 
PROBABILITY DISTRIBUTIONS 

A statistical model that includes the redshift for each lensed source 
should in principle offer the most reliable mass estimates from lens- 
ing data. However, photometric redshift estimates are inherently 
noisy and are subject to systematic uncertainties. The posterior 
probability distribution P(z), returned by standard photo-z codes, 
give the relative probability that a given source is at a particular 
redshift. This posterior distribution contains more information than 
a simple point estimate, in particular when multiple redshift solu- 
tions match the observed galaxy colors. In principle, we can use the 
P (z) for a given galaxy as a weighting function when comparing 
the observed and predicted shear for that galaxy. Here, however, 
we must also model the biases and scatter between the expected 
shear g, given a particular redshift and halo model (Eq.[TJ, and the 
measured shear g for each galaxy. We use a Bayesian approach to 
incorporate these effects, which we refer to as the P (z) method in 
this paper. 

Given i) the measured tangential shear g of one galaxy in the 
survey region of a cluster, ii) a physical model for the expected 
shear as a function of source redshift and cluster mass g(z, M) (and 
implicitly, other properties such as cluster redshift and source posi- 
tion), and iii) the photometric redshift probability distribution P,(z) 
for that galaxy, then the posterior probability for the mass of the 
cluster is: 



P(M\ gi ) oc P(M)PQi\M) 

= P(M) f P(g,\g(z,M))P,(z)dz . (8) 
Jo 

Here, P(M) is the prior on the cluster mass and P(g\g(z, M)) is the 
likelihood function for the shape measurement, specified below. By 
marginalizing over P,(z), we consistently compare the measured re- 
duced shear g with the proper model value, weighted for the correct 
relative probability at each redshift. This may be naturally extended 
to many galaxies, with 



3 The cosmic variance for the average redshift of deep fie ld galaxies has 
been estimated to be » 3%, by Ivan Waerbeke e t al] J2006h . However, the 
variance in (z) cannot be substituted for the variance in (fi). 



We suppress the symbol t to prevent clutter in the rest of the paper. 
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P(M\g) oc P(M)P(g\M) 

= P(M)Y\P(h\M) 

i 

= P(M) Y]j p (sMz,M))Pi(.z)dz . (9) 

The likelihood function P(g\g(z, M)) encodes the scatter and 
bias for the shape measurement g. Physically, the bias is sourced 
by calibration errors in the shape measurement. The scatter in the 
reduced shear is a convolution of the intrinsic galaxy distribution, 
after lensing, with the shape measurement error distribution. Ad- 
ditional scatter is induced by departures from the assumed mass 
model. Figure [5] plots the convolved shape distribution and mea- 
surement error, as well as scatter from assuming a spherical mass 
model in real systems. We restrict ourselves to the low shear regime 
so that we may ignore an y shear dependent, asymmetric scatter 
dGeiger & Schneider! 1 19981) . Though not shown in Fig. \5\ an ad- 
ditional scatter component also arises from galaxies with redshifts 
poorly modeled by their P (z) function. 

The simplest definition for P(£\g(z, M)) is a Gaussian; how- 
ever such a model neglects the significant tails in the distributions 
seen in Fig. [5] We apply both the posterior predictive cross valida- 
tion and the deviation inf ormation criteria m odel comparison tests 
to STEP2 simulated data dMassev et al.l2007h to determine the best 
fit shape for the scatter. We find that a Voigt profile is a better de- 
scription of the scatter than a Gaussian or double Gaussian model. 
A Voigt profile (a Gaussian convolved with a Lorentz distribution) 
has three parameters, the mean ft, the core width cr, and the wing 
amplitude T. 

The mean ji of the Voigt profile is a function of the predicted 
shear at a galaxy's position, given its redshift, and the shear cal- 
ibration par ameters m and c, as d efined in the STEP and STEP2 
simulations l lHevmans et alj200^) for the PSF of the observation: 

fi = m(size)g(z, M) + c . (10) 

The multiplicative bias, m(size), is a piecewise-linear function de- 
pending on the galaxy size, and is described by three parameters. 
We use a multivariate normal prior with covariances as measured 
from STEP2 simulations for m and c. See Section|6]for a descrip- 
tion of how we parametrize m and constrain this part of the model. 

The Voigt profile scatter parameters cr and T do not depend 
on individual galaxy properties as implemented. Instead, we place 
uninformative, flat priors on cr and T. For simplicity, we refer to the 
set of scatter and bias parameters m(size), c, cr, T as a, with a joint 
prior P(a). The values of ct can then be marginalized over for the 
posterior probability: 

P(M|g) = P(M) I P(.3)niP(gj\M,ct)dtf 

= P(M) f />(t?)n, fp(g,\g(z,M),ci)P,(z)dzda . (11) 

In Eq. QT| we refer only to the mass of the spherical NFW 
halo, M, and suppress the concentration parameter for clarity. For 
our analysis, we assume that c 2 oo = 4 with 0.11 dex log normal 
scatte r, appropriate for massive halos, M soa > 10 14 M G dNeto et al.l 
120071) . This concentration distribution is also marginalized over to 
determine the posterior probability for the mass. Finally, we set the 
prior on mass, P(M), to be uniform. Note that making the prior 
uniform in one measure of mass typically implies a non-uniform 



prior in other mass estimates. For this analysis, we measure masses 
within an aperture of 1.5 Mpc, and set the prior accordingly. 

In summary, our model has eight parameters: two for the halo 
model (mass and concentration); four for the STEP shear correction 
(3 parameterize the size-dependent multiplicative bias, 1 for the 
additive bias); and two to describe the shear scatter (cr and T of the 
Voigt profile). All parameters but the mass are marginalized. 

The center of the mass profile and the cluster redshift also en- 
ter our model, but we do not marginalize over these parameters. We 
anchor our profiles on the X-ray centroid for each cluster. See Paper 
I for how the center is defined in each of the clusters. Cluster red- 
shifts are determined from spectroscopic follow-up with negligable 
uncertainties. 

We sample the posterior probability distribution in Eq. [TT] 
using Markov Chain Monte Carlo with an Adaptive Metropolis 
step algori t hm, a s implemented in the PyMC software package 
dPatil et al.l [201oh . We numerically evaluate the m arginalization 
over each galaxy's P (z) using the Cython extension dBehnel et al.l 
1 20091) for computational optimization. 

Other efforts to include phot ometric redshifts in lensin g mea- 
surements exist in the literature. ISeitz & Schneidej d 19971) relate 
the measured moments of an ensemble shape distribution (i.e., 
(g)) to an integral over the redshift distribution. This presupposes 
that all galaxies in a sample are statistically identical, independent 
draws fro m a common redshift dist ribution. One could attempt to 
apply the lSeitz & Schneidej d 19971) method by calculating an ex- 
pected shear for each galaxy based on its P( z) and assuming Gaus - 
sian scatter. A similar approach is taken by iDawson et al.l d2012l) . 
where an average critical density is calculated for each galaxy 
based on its P{z) . The standard deviation of the average criti- 
cal density, weighted by the P(z), sets the width of the assumed 
Gaussian scatter and serv es as a per galaxy weight in that work. 
iMandelbaum et af] J2008h also pursue the approach of an average 
expected shear and weight per galaxy, optimized for galaxy-galaxy 
lensing. While these approaches rightfully downweight galaxies 
with poorly constrained P(z) distributions, information is lost, 
most notably when a lensed galaxy sits on the steeply rising part 
of the /3, curve (see Fig.[T](. This may most easily be seen by noting 
that the residuals with respect to the true redshift and shear will be 
correlated, which is neglected in these methods. This correlation is 
is accounted for with the marginalization in Eq.[8] 

The work b y I Gei ger & Schneidej d 19981) . and later 
iKing & Schneidej d200lh . uses an unbinned, maximum like- 
lihood approach that marginalizes the P(z) in a similar way 
to our method. We differ from their work in how the m e asured 
shear relates to the predicted shear. iGeiger & Schneidej d 19981) 
make strong assumptions about the intrinsic galaxy ellipticity 
distribution and ignore measurement uncertainties and biases. 
However, they explicitly include the skew induced in the scatter 
from high shear. We restrict ourselves to the low shear regime 
where the skew is negligible, and let the data determine the proper 
form of the scatter. 

Independe nt to the work presented in this paper, 

iKitching et all d201lh employ P(z) marginalization for 3D 
cosmic shear measurements. However, that work does not in- 
troduce the P(g\g) formalism that generalizes the relationship 
between measured shear estimator and true shear. 
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Shear Residual fig Shear B-mode g x 

Figure 5. Left: Residual scatter measured in STEP2 simulations after accounting for the average calibration bias. The shape of the residuals is nearly indepen- 
dent of the scatter profile assumed, when fitting for the average calibration bias. Right: Shear B-modes measured in 27 clusters; the cross terms are expected to 
be zero on average. The overplotted red lines are the best fit Voigt profiles for each histogram, while the blue dash-dotted lines are the best fit Gaussian profiles. 
The Gaussian model neglects the significant tails present in each distribution, and would therefore place too much weight on outlier shear measurements in a 
maximum-likelihood fit. 



Table 1. Best fit values for the Voigt Profile parameters cr and T in bins of shape S/N. Quoted uncertainties are the l<x marginalized constraints. Values of <x in 
S/N bins are used to weight shear estimates in the color-cut analysis. Only galaxies with a Lanczos3 S/N > 3 are accepted into the lensing analysis. 



STEP S/N 


Lanczos3 S/N 


<T 


PSF A 

r 


(X 


PSFC 

r 


5.00 - 6.00 
6.00 - 8.00 
8.00 - 10.00 
10.00 - 15.00 
15.00 - 20.00 
> 20.00 


2.17-2.61 
2.61 - 3.48 
3.48 - 4.35 
4.35 - 6.52 
6.52 - 8.70 
> 8.70 


0.24 ± 1.35e-03 
0.21 ± 7.95e-04 
0.19±7.95e-04 
0.19 ± 5.37e-04 
0.19 ± 8.27e-04 
0.22 ± 4.88e-04 


2.60e-02 ± 1.47e-03 
1.54e-02±7.87e-04 
1.38e-02±7.71e-04 
3.21e-03 ± 3.54e-04 
4.30e-03 ± 5.75e-04 
9.34e-04 ± 2.25e-05 


0.28 ± 1.90e-03 
0.23 ± 1.08e-03 
0.20 ± 1.06e-03 
0.19±7.38e-04 
0.19±9.98e-04 
0.22 ± 5.79e-04 


3.23e-02 ± 2.05e-03 
2.61e-02 ± 1.20e-03 
2.11e-02 ± 1.13e-03 
9.50e-03 ± 6.61e-04 
8.82e-03 ± 8.75e-04 
3.19e-03 ± 3.01e-04 



6 CALIBRATING THE SHEAR LIKELIHOOD 
FUNCTION 

Various shear estimator algorithms presented in the literature ex- 
hibit biases, with complex dependencies on PSF and galaxy prop- 
erties. When all other factors are held constant, KSB+ algorithms 
(such as the code analyseldac that we employ) show an approxi- 
mately li near relationship with true shear in the l ow shear regime, 
g < 0.3 terben et al.ll200ll:lHeymans et alj[200r3). The S TEP sim- 
ulation studies jHevmans et alj |2006: Massey et al. 2007}) parame- 
terize this bias as a function of shear using a multip licative bias m 
and an additive bias c (Eq. 1 101>. Masse vet alj d2007T) demonstrated 
explicitly that the parameters m and c are functions of the point 
spread function shape, size, and ellipticity, as well as the lensed 
galaxy shape S/N (or magnitude) and size. 

Our P(z) method described in Section[5]can correct for these 
biases and marginalize out any calibration uncertainty through 
the definition of the shear likelihood function P(g\g). We use the 
STEP2 simulations to calibrate the likelihood function, taking into 
account PSF and galaxy property dependent biases. 

STEP2 simulated images mimic lensing-quality SuprimeCam 
data. Six sets of images were produced (A-F), using five different 
PSF's sampled from Subaru images. Five of these sets used real- 
istic galaxy images derived from shapelet fits to galaxy morpholo- 
gies in the COSMOS HST field. Set A and C have PSF sizes of 0.6" 
and 0.75" full width-half maximum, respectively, spanning the typ- 
ical seeing range of our lensing images. The PSF ellipticity in the 



STEP2 simulations is somewhat smaller than typical for our im- 
ages. We quantify the systematic uncertainty from our use of the 
STEP2 A and C images in Section[9] 

We detect and measure the size and shape of objects in the 
STEP2 images using the same algorithms and cuts employed for 
our data catalogs. To explore the behavior of the STEP m and c pa- 
rameters in bins of various galaxy properties, we perform unbinned 
maximum likelihood fits to the STEP2 catalogs. In all fits, we de- 
scribe the scatter in measured versus true shear as a Voigt profile 
where the mean is given by/j = g- (l + m)g - c. The bias parame- 
ters m and c, and the Voigt profile scatter parameters cr and T, have 
uniform priors. After verifying consistent results, we fit galaxies 
from the STEP2 original and rotated image sets, as well as mea- 
surements of both shear components simultaneously. Uncertainties 
are determined from Markov Chain Monte Carlo exploration of the 
parameter space. 

After our shape S/N > 3 cut to remove false detections 
(equivalent to S/N > 12 from hfindpeaks, and a STEP2 S/N > 7 
lErben et alJkOQll, Paper I), we see no S/N dependence in our cali- 
bration. The color-cut method uses the best-fit values of cr, describ- 
ing the Voigt profile core width, to weight shear values when com- 
puting the average tangential shear in a radial bin (Section |4~3l >. The 
best-fit results for cr and T, in bins of S /N corresponding to Fig.|6ji, 
are shown in Table [T] 

We show the results of fitting the STEP m parameter in bins 
of galaxy size in Fig. [6] We model the clear size dependence in 
the STEP2 data usina the following fitting function, constrained 
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Figure 6. The STEP parameter m as a function of shape S /N, and galaxy size for two different PSFs simulated in STEP2 iMassev et aT]|2007l) Left: Galaxies 
are grouped by S/N, as measured by analyseldac. The parameter m is a sensitive function of S/N for values S/N < 6. The measured S/N values depend on 
the level of correlated noise in the image; STEP2 simulations exhibit much stronger correlation (by design) than is present in analysis images. Right: Galaxies 
are grouped by the half-light size, measured by analyseldac and normalized by the PSF size. Only galaxies with shape S/N > 7 are used for the right figure. 
Datapoints show the best fit value of m in bins of normalized size. A dependence on the size of the object is clearly seen. We model the size dependence of 
m as linear in size for small objects, and constant for large objects, where the break position is a free parameter in the model. We use an unbinned fit for the 
model. (Replicated from Paper I) 



in an unbinned analysis. For the definition of the shear likelihood 
function P(g\g), we parameterize the parameter m as a function of 
the galaxy size relative to the PSF size: 

(a^-+b Lf^L<x„ 
m = I rpsF r f s f 
\b ifls±> x 

\ I' PSF ' 

c = const . (12) 

We expect the multiplicative bias m to asymptote to zero for 
galaxies much larger than the PSF size. We therefore place a Gaus- 
sian prior, with \i = and width <x = 0.03, on the parameter b in 
Eq.[l2]when fitting. We also place a Gaussian prior on the pivot pa- 
rameter x p , with yu = 2.0 and width a = 0.2. The other parameters 
(a, b, c) have uniform priors. Due to the small number of galaxies 
of large size, we do not consider the largest objects in the STEP2 
images < 2.5). 

We approximate the posterior probability distributions from 
this model as a multivariate Gaussian, after marginalizing over the 
scatter parameters a and T. We use this multivariate Gaussian as 
a prior for the P(g\g) function in the mass modeling described in 
Section [5] Table [2] shows the best fit values and covariance matrix 
for the size-dependent shear calibration. 

For the P (z) analysis, we linearly interpolate between the re- 
sults for PSFs A & C to account for the different seeing in each 
cluster field. We do not extrapolate the correction to fields with see- 
ing better than 0.6" or worse than 0.75", but instead use the nearest 
measured correction. Observations with seeing below 0.5" are re- 
moved from our analysis because the PSF is undersampled. By im- 
plementing a PSF-size dependent STEP correction, we eliminated 
what would have been an approximately 8% systematic uncertainty 



Table 2. Posterior covariance matrix and best fit values for the size depen- 
dent shear calibration, defined in Eq. ll2l and measured from STEP2 images. 
The covariances of the c parameter are small enough that we set those ele- 
ments of the matrix to 0. The covariance matrix is approximately the same 
for both PSFs considered, and is assumed to be constant for mass measure- 
ments. 



Covariance 


a 


b 


Xp 


c 


a 


l.le-02 


-1.4e-04 


-1.7e-02 


0.0 


b 


-1.4e-04 


6.0e04 


2.9e03 


0.0 


x p 


-1.7e02 


2.9e03 


5.1e02 


0.0 


c 


0.0 


0.0 


0.0 


4e-04 


Best Fit 




PSF A 


0.20 


-0.028 


1.97 


-le-05 


PSFC 


0.22 


0.012 


1.93 


6e-04 



in the mean cluster mass, if we had assumed either the STEP set A 
or set C correction exclusively. 



7 TESTING FOR MASS BIASES WITH 
P(z) RECONSTRUCTIONS 

Photometric redshifts are inherently more noisy than spectroscopic 
redshift measurements. The amount of scatter, and the rate of out- 
liers, is a strong function of galaxy type and magnitude. Photo-z's 
computed with a template based code, such as BPZ, produce a pos- 
terior probability distribution P (z) which attempts to characterize 
the uncertainty in the measured redshift, within the limits of the 
assumed model. In this section, we create simulations based on 
the COSMOS-30 catalog to test to what extent uncertainties and 
biases in our photo-z measurements propagate to mass measure- 
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ments, given the photo-z cuts we apply to our catalogs. In addition, 
we use these simulations to quantify the effects of cluster galaxy 
contamination in the catalogs, since we do not explicitly model the 
presence of a massive cluster when we calculate photo-z 's. Finally, 
we investigate the performance of mass estimator methods that use 
photo-z point estimates rather than the full P (z) function. 



7.1 Cosmos-30 Based Simulations 

For each cluster in the sample, we create a simulated cluster field 
with an artificially high density of background galaxies to suppress 
shot noise. Galaxy redshifts and photometry are drawn from the 
COSMOS-30 catalog, and a measured shear is assigned to galaxies 
based on the mass of the simulated cluster. Additional galaxies are 
added to mimic cluster contamination. These artificial catalogs are 
then passed to photo-z and mass measurement algorithms in the 
same manner as real data. 

For o ur simulations, we assume that the COSMOS-30 red- 
shifts from lllbert et alj J2009I) represent the 'truth', approximately. 
COSMOS-30 used 30 broad and narrow filter photometry; redshifts 
were derived using templates optimized with emission lines to take 
advantage of the full filter information. Illbert et af I d2009h report 
photo-z accuracy of <x ; /(l + z) < 0.012 for z < 1.25 and i* b < 24. 
For 24 < i\ < 25, performance degrades to cr ; /(l +z) < 0.054 with 
a catastrophic outlier rate of 20% , where catastroph ic is defined as 
Az/(1 + z) > 0.15 (see figure 7 of Illbert et al1l2009t) . 

Galaxies from the COSMOS-30 catalog (as identified by the 
same cuts as Section[3] when possible), are randomly selected with 
replacement to form a blank field with a high number density of 
galaxies, allowing us to more easily determine the bias present in 
our measurements by suppressing shot noise. Galaxies are assigned 
a true tangential shear by computing the expected shear appropri- 
ate for a particular NFW halo using Eq. [T] given the known halo 
redshift, the COSMOS-30 redshift, and the position of the galaxy 
relative to the halo. We instead assign noise to create the 'mea- 
sured' shear. In principle, we could assign scatter following the 
shear likelihood that we measured in Section [6] To reduce com- 
putational complexity, we assign Gaussian scatter with <x = 0.25. 
No calibration bias is included. We have checked that using a Voigt 
profile as the form of the scatter in our simulations does not change 
our conclusions. 

When calculating photo-z 's, for our cluster fields we already 
know that a massive cluster is in the field. Ideally, this information 
should also be incorporated into the P (z) function. Without explic- 
itly modeling the presence of the cluster, one might expect these 
cluster galaxies to bias low the measured mass, as cluster galaxies 
scatter into the acceptance catalog. We explore the effects of con- 
taminating cluster members in our simulation by introducing galax- 
ies in the simulated catalogs with zero net tangential shear (only 
scatter). We select blue galaxies (COSMOS-30 type > 8) within 
|Az| < 0.05 of the halo redshift from COSMOS-30 that pass our 
selection criteria and place them into the catalogs following an ex- 
ponential number density profile centered on the NFW halo: 

n(r) = n bmk f 500 exp(l - r/r 500 ) . (13) 

The parameter n back is the background number density and / 50 o is 
the contamination fraction at the halo rsoo- We simulate halos at 
the redshifts and masses of the clusters in our sample, with three 
levels of cluster-galaxy contamination. We generate 50 realizations 
for each redshift, mass, and contamination set. 

We model our photo-z measurements using a subset of the 



available wide optical filters in the COSMOS-30 catalog. We fol- 
low the same procedure used to calculate photo-z 's as our analysis, 
including re-calibrating the photometric zeropoints with stellar lo- 
cus regression and using BPZ with modified priors and templates. 
See Paper II for an exploration of our photo-z quality. 

For our simulation analysis, we accept galaxies to the COS- 
MOS magnitude limit ;' + < 25. We note, however, that COSMOS- 
30 galaxies are subject to multiple photo-z solutions at the faintest 
magnitudes (24.5 < i + < 25; T. Schrabback, priv. comm), and point 
estimates adopted by the COSMOS team cannot be taken as reason- 
able approximations of the truth. The full P (z) posterior probabil- 
ities from the COSMOS-30 study are not published. We have run 
alternative simulations with i + < 24.5 and do not see any significant 
impact on our conclusions. 

These simulations discussed in this section model the effects 
of photometric redshift uncertainties and deficiencies with respect 
to cluster galaxies. We emphasize, however, that these simula- 
tions reflect only one realization of the photometric calibration and 
photo-z code on one cosmic variance limited field (although we do 
not expect much additional scatter for clusters in the studied red- 
shift range). We do not simulate the effects of large scale structure, 
correlated nearby structure, or triaxiality, which are better handled 
with N-body simulations, and can be considered a separable prob- 
lem. We also emphasize that these are catalog based simulations, 
as we do not simulate the problem of measuring shapes from im- 
ages. Mass profile and shear measurements issues are addressed in 
Section|9] 

7.2 P (z) Method Performance 

We apply the P(z) method to the COSMOS-30 based simu- 
lated catalogs using the 6jVj^c' + ^ + photometry available in the 
COSMOS-30 catalog. In these results, we mimic the selection cuts 
that we apply to real data. This includes applying a size cut, reject- 
ing galaxies with a photometric redshift point estimate outside the 
range z c i U ster + 0. 1 < z < 1 .25 and rejecting galaxies with very wide 
P(z) , Az95% > 2.5. Since we assign shears to the galaxies directly, 
we cannot replicate shape measurement quality cuts. Unless other- 
wise stated, we assume that we know the form of the shear scatter 
(which for simplicity in the simulations is a Gaussian with standard 
deviation <x B = 0.25). The results are shown in Fig. [7] When using 
Bj Vj.Rc' + ; + photometry, the expected bias on the mean mass for any 
single cluster mass and redshift combination never exceeds ±5%. 
Furthermore, as we show below, the statistical constraints on the 
mean bias for the sample is significantly better. 

We use these redshift- and mass-matched simulations to de- 
rive a composite bias and systematic uncertainty for our cluster 
sample. We fit for a constant ratio between the recovered mass and 
the underlying true mass assuming a per-cluster Gaussian scatter 
as reported by the fit to each realization. Table [3] shows the best fit 
results for the ratio and 68% confidence interval for each of the dif- 
ferent contamination levels. The posterior probability distribution 
for the ratio is well described by a Gaussian, and we quote the re- 
sults accordingly. We also checked for the presence of a normally 
distributed intrinsic scatter component; none was detected. 

For a typical level of cluster galaxy contamination (10% at 
i?50o), the expected multiplicative bias over the sample due to the 
effects of photo-z uncertainties and catastrophic outliers is a 1.2% 
overestimate of the mean cluster mass. The statistical lcr uncertain- 
ties on this bias are less than 1%. The presence of cluster galaxies 
has a minimal impact on the bias, ranging from 1.4% to 1.1% as 
the contamination fraction varies from to 20%. We can apply this 
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Figure 7. Expected fractional bias in the mass within 1.5 Mpc (mean and 
68% confidence interval on the mean from bootstrapping) for each clus- 
ter in the P(z) sample from high galaxy-density simulations using the 
B]V]Rci + z + filter set. We show the bias for differing levels of contamination 
from cluster galaxies, offset in redshift for clarity. For our redshift range, 
we detect an overestimate of the mean cluster mass of 1.2%, with a 1% 
sensitivity to the assumed cluster galaxy contamination level. 



Table 3. Summary of the sample bias and uncertainty from simulations due 
to photometric redshift errors while using BjVji?c;' + z + filters, for different 
fractions of cluster galaxy contamination. (1) The fraction of galaxies that 
are cluster members at R500 with respect to field galaxies (Esq [3) (2) The 
mean fractional bias for measured masses within 1.5 Mpc, for the popu- 
lation of 26 galaxy clusters simulated, and lcr uncertainties. The posterior 
probability distributions for the mean fractional mass bias are well modeled 
by a Gaussian. 



Contamination Fraction 


Mean Fractional Mass Bias 


(1) 


(2) 


0% 


1.014 ±0.003 


10% 


1.012 ±0.003 


20% 


1.011 ±0.003 



overestimate of 1.2% as a correction to the P(z) masses. Because 
we do not know the average contamination for our cluster sample, 
but expect it to be ~ 10% (see Section [4j, we quote a systematic 
uncertainty that spans the calibration results from 0% to 20% con- 
tamination. Even with this caveat, our systematic uncertainty on the 
bias from photo-z errors, given the performance in the COSMOS 
field, is at most 1%. 

We also ran simulations using shear scatter that follows a Voigt 
profile instead of a Gaussian. We ran simulations with both one 
and two populations of galaxies with different scatter parameters 
(cr, T) and reconstructed the masses using a wide uniform prior on cr 
and T. We see no significant change in our results. In an additional 
test, we reconstructed masses from simulations with shear scatter 
cr g = 0.20, but fixing the prior to cr g = 0.16. Such an error leads to 
an underestimate in the recovered mass of ~ 5%, emphasizing the 
need for flexible priors on scatter parameters. 

7.3 Point Estimator Performance 

To emphasize the importance of using the full P(z) information, 
we have also examined an alternative mass reconstruction method, 
using only photo-z point estimators. A point estimator is usually the 
redshift at which the P (z) is maximum, though it could in principle 
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Figure 8. Simulated fractional bias in cluster mass reconstructions 
when only photometric redshift point estimators are used, based on 
BjV]Rci + z + photometry. The simulated input catalogs are identical to those 
used in Fig. [7] with 10% contamination, though different analysis cuts are 
applied. For reference, the P (z) method results are shown in light grey, and 
are offset slightly in redshift. A significant redshift dependent bias is clearly 
seen when using the point estimators. 



also be the mean or the median of the posterior probability. BPZ 
reports the most likely redshift, marginalized over all templates, as 
its point estimator zt- 

We use an alternative set of simulations that include galaxies 
at all redshifts, and have our red sequence cuts applied. Assuming 
Gaussian scatter for the tangential shears (which is an accurate as- 
sumption for the baseline simulations), we perform a^ 2 fit between 
measured and predicted shear using only the photo-z point estima- 
tor: 



A" 



z 



(g, - g(z, M)) 2 



(14) 



(fo-) 2 

We f ollow the implementation outlined in iNewman et al.l ( I2009L 
1201 lh . which includes a slight inflation of the assumed cr between 
model and measurement, / = 1.02, to account for the uncertainty 
in redshift estimates. Galaxies are excluded from the fit if 25% 
of the probability in the P(z) is at z < Zcluster- Simulation results 
using photometric redshifts calculated with both u6jVji? c I+ z +an d 
i?jVji?c !+ ~ + photometry, with cluster galaxy contamination normal- 
ized to 10% at i?5oo, are provided in Figure [8] This method and 
associated cuts display a clear redshift-dependent bias. At higher 
redshifts (z > 0.4), the use of photo-z point estimators leads to a 
«7% systematic bias. 



8 MASS MEASUREMENT RESULTS 

In this section, we report the masses measured for each cluster us- 
ing both the color-cut and P(z) methods. Table [4] lists the best fit 
masses and 68% statistical confidence intervals for each method. 
Mass point estimates are median values, with 68% confidence in- 
tervals defined as the 16th to the 84th percentile values. Sections [4] 
and [5] detail how the statistical uncertainties are derived for each 
method. The uncertainties on individual cluster masses from tri- 
axiality, line-of-sight structure, and correlated structure are not in- 
cluded in the quoted confidence intervals. In combination, these ef- 
fects can be expec ted to add to the uncertainty in individual masses 
at the 20% level dBecker & Kravtsovll201 ll ; iHoekstra et ai1l201ll; 
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Figure 9. A comparison of the precision of the mass measurement for all 
clusters in the sample, measured as the fractional uncertainty in the mea- 
sured mass, for the color-cut (51 clusters) and the P(z) methods (27 clus- 
ters). The asymmetric uncertainties listed in Table [4] have been averaged 
for this plot. Both methods achieve a similar level of precision, with a 
P(z) to color-cut statistical -error mean ratio of 1 .16. The outlier at z=0.54 
is MACS 1423+24 (see also lLimousin et all20ld) . 



ICorless & King] |2007|) . The overall systematic uncertainty from 
these sources on the mean cluster-sample mass are addressed sepa- 
rately in Section|9] 

First, we compare the statistical precision with which the 
color-cut method and the P(z) method constrain cluster masses. 
Figure[9]shows the fractional uncertainty measured for each cluster 
in our sample, as a function of redshift. Recall that P(z) masses 
are only determined for 27 of the 51 clusters. Both methods mea- 
sure cluster masses with comparable statistical precision, despite 
the lower number of galaxies available for the P (z) method anal- 
ysis. The mean ratio of P(z) to color-cut 68% statistical errors is 
1.16. The mild loss of precision for both methods at high redshift is 
driven primarily by the decrease in the angular size of the clusters, 
and thus the smaller number of galaxies accepted into the shear 
profile fit. 

Next, we compare the mass estimates from the photometric 
redshift based P (z) and the color cut methods, to check for internal 
consistency. Figure [Tol shows the ratio between the color cut and 
P (z) method for each cluster, and for the sample as a whole. The 
mass measurements from our two methods are correlated because 
the same galaxy catalogs are used as input for each. We bootstrap 
the last common galaxy catalog for each cluster to determine the 
correlated uncertainties between the two methods. 

For individual cluster masses, scatter between the two meth- 
ods arises due to the effects of cosmic variance in the color- 
cut method and the differences in galaxy selection. Correlation 
between the two methods should decrease with increasing clus- 
ter redshift as the effects of cosmic variance become more pro- 
nounced and galaxy selection diverges. The uncertainty in the 
cross-calibration ratio at z < 0.4 is ~ 20%, while in contrast, un- 
certainty on the ratio at z > 0.4 grows to ~ 40% at the highest 
redshifts. 

We use the bootstrapped masses to measure the ratio and in- 
trinsic scatter between the two methods. Any systematic offset be- 
tween the two methods would most likely indicate one or more 
of the following: systematic errors in the color-cut method, arising 
from a mismatch between the COSMOS field and the average clus- 



Figure 10. A comparison of masses recovered from the color-cut and the 
P(z) methods. Error bars for each cluster point are determined by boot- 
strapping the input catalog for both methods simultaneously. Points are the 
median ratio and 68% confidence interval for each cluster from the boot- 
strap realizations. P(z) masses do not include the calibration correction 
from Section rT2l The dashed line and red shaded region is the best fit ratio 
between the two methods,/? = 0.999 + 0.046-0.041, for all 27 clusters with 
fljVjKc/cz + photometry. 



ter field; the use of the wrong number density model or an incorrect 
estimate of the galaxy density background level for the color-cut 
method's contamination correction; or bias in the color-cut masses 
due to using the single source plane approximation. Additionally, 
intrinsic scatter between the methods may be induced if systematic 
scatter exists in the derived field-galaxy redshift distribution (with 
respect to COSMOS) or the contamination correction in the color- 
cut method. The severity of any systematic bias and scatter (sta- 
tistical or intrinsic) is expected to worsen for higher redshift fields 
(see Section l4~4l . but we see no evidence for a redshift dependence 
given the noise level present in our data. 

To measure an offset, we fit for the ratio p between the color- 
cut and P (z) masses, with an additional intrinsic, log normal scatter 
with width <T im . Assuming uniform priors, the model likelihood is 



P(J3,cr ml ) oc 



nil 

M i . p(z) ,M i ,„ 



N In 



M Ucc 



pM L 



eta \P(M um , M i , cc )dM', 



(15) 

where the correlated uncertainties between the two measurements, 
P(Mj p{z) , Mj cc ) are defined by bootstrap sampling on the last com- 
mon galaxy catalog. We numerically integrate the marginalization 
integral by converting it to a sum over bootstrap samples. Note that 
this formulation breaks down for small values of cr„„ due to the lim- 
ited number of available bootstrap samples; we therefore only con- 
sider a in , > 0.02. The best fit value for the offset is p = 0.999+° ^*, 
showing no offset between the P (z) and color-cut methods. We do 
not claim a detection of intrinsic scatter between the two methods, 
and instead measure a 2<r upper bound at 15%. We must stress that 
these calibration results apply to our particular implementation of 
the color-cut method only. 



© 2010 RAS, MNRAS 000.mi24l 



Weighing the Giants III: Methods & Measurements of Accurate Lensing Cluster Masses 15 



Table 4. Lensing masses from the P (z) method and the color-cut method. ( 1 ) cluster name, (2) cluster redshift; Columns (3) & (4) report results from the 
P(z) method: (3) median scale radius and 68% confidence interval after marginalization; (4) median mass within 1.5Mpc of the cluster center and 68% 
confidence interval after marginalization. Columns (5) & (6) report results from the Color Cut Method: (5) median fit scale radius and 68% confidence interval 
(concentration set to four); (6) median mass within 1.5Mpc of the cluster center and 68% confidence interval. Masses are in units of 10 14 M o . P(z) masses do 
not include the calibration correction from Section PT2l 



Cluster 



(1) 



Redshift 



(2) 



A2204 0.152 

A750 0.163 

RXJ1720.1+2638 0.164 

A383 0.188 

A209 0.206 

A963 0.206 

A2261 0.224 

A2219 0.228 

A2390 0.233 

RXJ2129.6+0005 0.235 

A521 0.247 

A1835 0.253 

A68 0.255 

A2631 0.278 

A1758N 0.279 

RXJ0142.0+2131 0.280 

A611 0.288 

Zw7215 0.290 

A2552 0.302 

MS2137.3-2353 0.313 

MACSJ1 115.8+0129 0.355 

RXJ1532.8+3021 0.363 

A370 0.375 

MACSJ0850. 1+3604 0.378 

MACSJ0949.8+1708 0.384 

MACSJ1720.2+3536 0.387 

MACSJ1731.6+2252 0.389 

MACSJ221 1.7-0349 0.397 

MACSJ0429.6-0253 0.399 

RXJ2228.6+2037 0.411 

MACSJ045 1.9+0006 0.429 

MACSJ1206.2-0847 0.439 

MACSJ0417.5-1154 0.443 
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9 OVERALL SYSTEMATIC UNCERTAINTY & 
CROSS-CHECKS 

Systematic biases can enter a lensing analysis from three primary 
sources: galaxy shape measurements; the mass model; and the al- 
ready discussed uncertainties associated with the redshift distri- 
bution (or, as in the case of the P(z) method, propagation of 
P (z) errors directly into mass measurements). Accurate quantifica- 
tion of these uncertainties is particularly important to maintain the 
power and integrity of cosmological constraints (e.g. Allen et all 



120081 : iMantz et al.ll2008l , 1201 Oal : IWu et alfeOld : I Allen et al .11201 lb 
In this section, we estimate the level of each of these systematic 
uncertainties. We also present a series of cross-checks. Since the 
primary goal of this series of papers is to calibrate mass proxies 
for cosmological studies, we will concentrate our discussion on 
systematic uncertainties affecting the measured mean-mass of our 
sample. Section |9~5l provides a summary of all significant system- 
atic uncertainties in the analysis. 



9.1 Shape Measurement Uncertainty 

The dominant systematic uncertainty for lensed-galaxy shape mea- 
surements is the shear correction derived from STEP2. Both the 
precision to which we can measure the shear calibration using 
STEP2 simulations, and the differences between our measurements 
and what was simulated in STEP2, contribute to the systematic un- 
certainty in the mean cluster mass. 

Our shear calibration model (Eq. I12t has 4 free parameters at 
a fixed PSF size. We measure the mean values and covariance of 
these parameters from STEP2 simulations. The finite size of the 
STEP2 simulations places a limit on our ability to constrain the 
correction parameters. While the uncertainty on the shear correc- 
tion is subdominant to the statistical noise for individual clusters 
(this is marginalized over in the P{z) method and ignored in the 
color-cut method), the shear correction scatters coherently for the 
cluster sample and will affect the cluster mean-mass. Most impor- 
tantly, the multiplicative shear bias, m, will scale the mean-mass of 
the sample approximately linearly with (1 + m). 

We approximate the uncertainty on the mean mass arising 
from the multiplicative shear bias by measuring the distribution 
width of (1 + m) at fixed object size. The distribution of (1 + m) is 
approximately Gaussian, with a standard deviation no larger than 
3% of the correction value, for objects at nearly all sizes for STEP 
image sets A and C (on which we base our correction). We veri- 
fied that this result is insensitive to the prior we place on the shear 
correction model (Eq. I12t . A 3% uncertainty is conservative, as 
it roughly represents a vertical translation to the curves shown in 
Fig. [6] The detailed correlations between correction parameters, 
convolved with the object shape distributions in each cluster, will 
likely result in smaller mean mass variations. 

In addition to the statistical limits of the STEP2 calibration, we 
are susceptible to the finite sample of PSF's tested in STEP2. Shear 
calibr ation corrections depend on details of the PSF size and ellip- 
ticity l lHevmans etail2 006; Mas sev et alj2007t) . While the STEP2 
simulations are designed to mimic SuprimeCam observations, the 
simulations do not span the entire space of observed PSFs in our 
observations. We estimate our corrections from two STEP2 image 
sets, A (seeing 0.6") & C (seeing 0.75"), that represent well be- 
haved observations. 

Figure [TT] shows the measured PSF size and ellipticity dis- 
tributions for our images, with the STEP2 image sets marked for 
comparison. A PSF size of 0.6" is typical of our observations, 



with 0.75" seeing bracketing the majority of our observations from 
above. The mean mass does not depend sensitively to the details of 
how we interpolate between the two shear corrections. If we instead 
linearly extrapolate the shear correction beyond the seeing spanned 
by the STEP2 images (to less than 0.6" or larger than 0.75"), the 
mean mass shifts by no more than 1% (ie, most of our observations 
are within the interpolation regime). 

In addition to the PSF size, STEP2 simulations only coarsely 
span the PSF ellipticities observed in the cluster fields. We com- 
pare the distribution of PSF ellipticity magnitudes from the data 
to available STEP2 images in Figure [TT] Our observations tend to 
have a more elliptical PSF than the baseline image sets A(0.6", 
e = 0.01) and C(0.75", e = 0.01), but are significantly less ellipti- 
cal than set D (0.7", e = 0.09) . If we were to compare the average 
cluster masses recovered from applying a correction derived purely 
from the STEP2 high ellipticity image set, D, to the baseline correc- 
tion (interpolating between sets A and C), our masses would be 2% 
lower. Since the shear correction derived from image set D does not 
take into account the known size dependence, this 2% difference is 
an upper bound on the systematic uncertainty on our mean cluster 
mass measurement from unmodeled size and ellipticity dependen- 
cies in our shear calibration. 

Because of the STEP2 simulation's use of realistic galaxy 
shapes, we do not expect a significant systematic bias from a mis- 
match in the galaxy population sampled in the STEP2 images com- 
pared to our Subaru observations. 

Additional systematic uncertainties from shape measurements 
may arise from either the image coaddition process, or how the PSF 
model is interpolated over each coadded field. Section 5.5 and Ap- 
pendix B from Paper I detail extensive checks of the PSF and coad- 
dition procedure. Cluster masses measured from image coadditions 
for different camera rotations, and different nights of comparable 
seeing, showed no signs of systematic bias. In addition, our mea- 
sured cluster masses are insensitive to the particular polynomial 
order used to interpolate the PSF across the field of view, above a 
minimum order optimized for each observation. From our analysis 
in Paper I, we estimate that the systematic uncertainty associated 
with these sources is no more than 1%. 

Stellar contamination in the lensing catalogs can cause a sys- 
tematic dilution of lensing masses. For our analysis, we have se- 
lected galaxies by only accepting objects at least 15% larger than 
the PSF size. Masses are consistent at the 0.5% level when we em- 
ploy a PSF size cut of 20%, taking into account the correlations 
between such mass measurements, implying that our standard 15% 
size cut is sufficient. We have also visually inspected the color-color 
photometry diagrams for a stack of 30 cluster catalogs for evidence 
of a stellar locus; no stellar locus was observed. 

In conclusion, the total systematic uncertainty associated with 
shear measurements is 4% for our analysis. 

9.2 Mass Model Uncertainty 

In the limit of perfect shear and photo-z measurements, simulations 
and analytical investigations show that the application of spherical 
NFW models to lensing analyses will suffer a ~20% system-to- 
system scatter, although the overall mean sample bias should be 
small if the observed sample draws fairly from the triaxial distri- 
bution o f halos (and if one is car e ful about the radial ra nge con- 
sidered: |Becter^foavtso3|20ll|; |Coriesr&an3|20o3). We ex- 
pect no significant systematic bias originating from the selection 
function for our sample, since our clusters are X-ray selected and 
therefore should approximately fairly sample all possible orienta- 



© 2010 RAS, MNRAS 000. [TEI 



Weighing the Giants III: Methods & Measurements of Accurate Lensing Cluster Masses 17 




STEP2 PSF A 

- - STEP2 PSF C 

- ■ - STEP2 PSF D 



0.6 0.7 0.8 0.9 

Seeing FWHM (") 



1 



4 




ds 




.32 , 

Li. 3 




a> 




ust 




O 2 








=tt 




1 






PSF Ellipticity |e| 



Figure 11. Left: The distribution of seeing for cluster observations in the comparison set. Right: The ellipticity distribution for those same fields. We take the 
median ellipticity magnitude from stars in a field as the PSF ellipticity. Vertical lines mark values for STEP2 image sets. 



tion angles. However, the finite number of clusters in our sample 
limits our ability to average over triaxial orientations and we there- 
fore expect some residual scatter in the sample mean mass from 
this source. For our full set of 51 clusters, we expect a systematic 
uncertainty of 20%/ V51 » 3%. 

While an NFW halo model is the traditional choice for lens- 
ing analyses, some recent works have argued that the NFW model 
is a poor description of d ark matter halos beyond the virial radius 
dOguri & Hamanall201ll ; iBahe et al.ll201lh . Halo density profiles 
drop off faster than NFW beyond the virial radius, before a stochas- 



pie authors (e.g. 


Becker & KravtsovfeOlltlOguri & Hamanall201ll; 


lBaheetal.112011 


) have shown that fitting an NFW profile to arbi- 



trarily large projected radii can result in significant mass-estimate 
biases. However, the same authors also show that a judiciously cho- 
sen outer fit radius essentially eliminates this bias, motivating our 
outer radius cut at 3 Mpc. 

To check for an outer fit-radius dependent bias, we compare 
our baseline mass measurements, measured in a radial range of 
750kpc - 3Mpc, to masses measured from 750kpc - 5Mpc (or until 
the edge of the available field for low redshift clusters). Using boot- 
strapped catalogs to properly account for the correlations between 
the two measurements, we see only a marginal shift in the mean 
cluster mass. Masses from fits to data out to 5Mpc are, on average, 
lower by a factor of 0.987+°'°[g. 

To test the susceptibility of our measured masses to the spe- 
cific form of the model density profile, we also meas ure masses 
using two smoothly truncated NFW-like profile s fromlBaltz et al.| 



20091). referred to as 'BMO-1' and 'BMO-2' in lOguri & Hamanal 



201 lh . The BMO profiles truncate the NFW model at different 
rates at a truncation radius t. The BMO-2 profile w as verified 
against N-body simulations in lOguri & Hamanal d201ll) . Given the 
nature of our data, we do not include the stochastic "two-halo" term 
when fitting the BMO profiles (we also use a different radial fi t 
range and measure a different mass than IOguri&Hamandl20111) . 
We find that the BMO-2 profile returns a mean cluster mass 2% 
smaller than the NFW profile, with mass and redshift dependence. 
The BMO-1 profile, returns a mean mass 3 - 4% smaller than the 
NFW profile, depending on the assumed truncation radius, t = 2.6 
or t = 2, respectively. 

Note that publicly available simulations do not yet probe clus- 
ter populations in the specific mass range (M500 > 10 15 M o ) studied 



here with sufficient statistics. New, large-volume simulations will 
offer improved guidance to the systematic uncertainties for such 
objects, and will likely improve these systematic tolerances. Un- 
til such simulation guidance becomes available, however, we asso- 
ciate a 3% systematic uncertainty from our use of NFW models. 

We have verified that our choice of center for the NFW profile 
fit, the X-ray centroid, does not lead to a systematic uncertainty in 
the cluster mass. We see no change in the mean cluster mass to 1% 
if we instead center the NFW profile fits on the BCG in each system 
(see Paper I). 

In summary, uncertainties in the mass model contribute a total 
systematic uncertainty of « 4%. 



9.3 Redshift Distribution Uncertainty 

In Section UJ we estimated the uncertainty in the P(z) mass mea- 
surements due to a range of systematic uncertainties associated 
with photo-z performance. For our sample redshift range, and us- 
ing Bj VjRq1cZ + photometry, we showed the P (z) method typically 
overestimates the mass by 1.2%, with an uncertainty of « 1%. This 
uncertainty is dominated by the unknown fraction of contaminating 
cluster galaxies present in each cluster. 

Several additional sources of potential systematic scatter, as- 
sociated with photometric calibration (Paper II), are not captured 
by the simulations discussed in Section |7j The photometric cali- 
bration could use other versions of the stellar locus, and/or employ 
zeropoint training in the Bj filter. Overall, our study of alternative 
prescriptions for the photometric calibration result in shifts in the 
mean cluster sample mass of up to ss 3%. Nonetheless, residual 
calibration and photo-z systematic uncertainties are subdominant 
to other sources of error in the analysis. 

COSMOS uses the r + and i + filters while our observations 
typically use the SuprimeCam R c and I c filters. The SuprimeCam 
filter set resembles the SDSS r + and ; + filters more so than tradi- 
tion Johnson-Cousin filters^. While we cannot explicitly verify the 
biases discussed above using R c and I c , we do not anticipate sig- 
nificant differences from the values quoted. We also note that the 
depths of observations used for this study vary, and do not exactly 
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match the depths to which the COSMOS field was observed. How- 
ever, we again expect that these issues are secondary to the direct 
uncertainties from photo-z 's discussed, and are not dominant in the 
analysis. To be conservative, we allocate an additional 1% uncer- 
tainty for these effects. 

Adding these uncertainties in quadrature, we estimate a total 
systematic uncertainty associated with the redshift distribution of 
= 3%. 

9.4 Data-Driven Systematic Cross-Checks 

A number of additional systematic cross-checks were performed 
to verify the accuracy of the P (z) method. These checks consisted 
of splitting the galaxies in each field into subsamples and deter- 
mining cluster masses for each subsample. These checks provide 
reassurance that the statistical model assumed in the P (z) method 
is adequate, but do not provide sufficient statistical power to quan- 
tify additional residual systematic uncertainties in the analysis. We 
know from the COSMOS-30 simulations in Section|7]that the mean 
masses should be accurate. Therefore any offsets detected in these 
tests represent internal tensions in the analysis and avenues for im- 
provement in the precision of individual cluster masses. 

As a cross-check on the suitability of the size dependent shear 
calibration, we independently measure cluster masses with galaxies 
above and below the median galaxy size in each field. Figure |9l4l 
shows the ratio between the two masses for each cluster in the 
P (z) sample, as a function of redshift. The mean offset between the 
two reconstructions is 1.01^*, consistent with zero offset within 
the lcr uncertainties. We repeat the exercise in Fig. 19.41 now split- 
ting galaxies into samples at the median shape S/N value in each 
field. The mean offset between the two reconstructions is 0.91*!!'™, 
a lcr offset. For both cross-checks, galaxies were accepted into the 
fit from a larger fit range, 600 kpc < R < 5 Mpc, to increase the 
available galaxy statistics. 

To check for consistencies in the shear signal between back- 
ground galaxies at different redshifts, we fit masses to indepen- 
dent samples below and above the median lensed galaxy redshift 
in each field. T his is equivalent t o a simplified "shear-ratio" con- 
sistency check ^Taylor et alj|2007|) . The median redshift is z as 0.8, 
depending on the redshift of the cluster. The results are shown in 
Fig. 19.41 From the bootstrap analysis, we see that masses recon- 
structed with high redshift galaxies tend to be 0.88+JjjJg% less mas- 
sive than masses reconstructed with low redshift galaxies. The 2a 
confidence region is 0.74 to 1.00. We also split lensed galaxies by 
their z' + magnitude, either measured or reconstructed, and indepen- 
dently measured the cluster masses (Fig. |9.4t . The ratio of masses 
from bright objects to faint objects is 1.02*? - ?!. 

We see that internal tension exists in the data, particularly 
when galaxies are divided by their shape 5 /N or by their redshift. 
We do not interpret this tension as a sign of systematic error. In- 
stead, this tension represents a promising avenue to increase the 
precision of individual cluster masses in future work. 

9.5 Summary of Systematic Uncertainties 

Systematic uncertainties in our mass calibration arise from the 
shear measurements, the assumed mass model, and from uncer- 
tainties in the lensed-galaxy redshifts. We have investigated each 
in turn with checks from simulations and data. Table[5]summarizes 
our estimates of the systematic uncertainties in the analysis. 

The systematic uncertainties associated with the shear mea- 
surements and the mass model apply both to the color-cut and the 



P (z) methods. The uncertainties associated with redshift measure- 
ments listed in the table apply only to the P(z) method, and are 
constrained to be less than 2%. The systematic uncertainties that 
only apply to the color-cut method are more difficult to quantify 
(section l4~4t . To gauge the uncertainties in those measurements, 
we therefore pursue a strategy of cross-calibration. By scaling the 
color-cut masses by the average ratio between the two methods (as 
measured in Section [8}, we calibrate out the unknown systematic 
uncertainties in the color-cut analysis for the price of adding the 
statistical uncertainty in the average ratio. For the 27 clusters with 
BjViRcIcZ + photometry, the uncertainty in the cross-calibration be- 
tween the color-cut and P (z) methods is ~ 4%. 

We expect each source of systematic uncertainty to be inde- 
pendent, and have approximated each source as a Gaussian. Our 
total systematic uncertainty on the mean cluster mass, for 5 1 clus- 
ters, is therefore 7%. Results are comparable when only masses 
measured with the P (z) method are used. 



10 COMPARISON TO THE LITERATURE 

For a number of clusters considered here, previous weak-lensing 
mass measurements have been reported in the literature. In this 
section, we compare our mass measurements with those works, in 
cases where those studies have employed a homogeneous weak- 
lensing methodology, have at least five clusters in common with 
the present study, and quote mass measurements at a suitably large 
density contrast (M soo , M 2 oo, etc.). All of the previously reported 
mass measurements considered here are based on variations of the 
color-cut method. Almost all of the clusters overlapping the present 
study are at relatively low redshifts, z ~ 0.2; in this redshift regime, 
the color-cut method can in principle provide robust mass measure- 
ments, although significant care needs to be taken in calibrating 
the shear measurements, correcting for contamination from cluster 
galaxies, and in estimating the redshift distribution of background 
galaxies (Section^. 

To facilitate the most direct comparison of mass measure- 
ments, we redetermine masses within the same radius used in the 
literature work. In cases where the measurement radius is not ex- 
plicitly reported, we calculate it from the cited overdensity mass, 
r A = (3M A /47rAp c (z)) 1/3 , adopting the cosmology used in the lit- 
erature work. In each case, we fit the shear profile over the radial 
range 750 kpc - 3 Mpc, and measure the mass within the measure- 
ment radius using the calibrated color-cut method and NFW-model 
fits. 

10.1 Comparison to lOkabe et al] fcOlOh 

The study o f lOkabeetalJfeOld, 30 clusters in total) has the largest 
overlap with our sample, with 14 clusters in common. The mass 
measurements are based on two-filter imaging with SuprimeCam, 

and some of the raw data are in common with our study. 

We find a significant offset between the lOkabe et alj fcOld) 
mass measurements and ours, with the former being lower on aver- 
age by ~ 25% (Fig. 1 13b). Such a large offset is likely to be due to 
a combination of effects. One possible cause for a sizable bias lies 
in the di fferent depths of the galaxy catalogs used for the lensing 
analysis. lOkabe et all bold) use galaxies to typically i + < 26, at 
least a magnitude fainter than our completeness limits (Section[4]l. 
The completeness limits in our study were set such that the signal- 
to-noise of each galaxy's shape measurement is large enough to 
ensure that the shear measurement bias can be robustly calibrated 
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Figure 12. For each galaxy cluster with five filter photometry, we reconstruct masses independently for different subsamples of equal statistical weight. For 
each cluster, we split galaxies at the median value of (a) background galaxy size, (b) background galaxy shape S/N as reported by analyseldac, (c) background 
galaxy redshift, (d) and I-band magnitude. The black points show the ratios and \tr uncertainties for each cluster in the sample. The dark and light red bands 
are the lcr and 2tr uncertainties on the ratio for the sample. Out of the four tests, we detect a ratio deviating from 1 at the Itr level in one test (splitting by 
shape S/N), and one test deviating at 2tr (splitting by galaxy redshift). This internal tension will increase the cluster-to-cluster scatter and will increase the per 
cluster error, however it should not impact the mean sample mass. 



Table 5. Summary of the sources and levels of systematic uncertainty in the analysis. Shear measurement and mass model uncertainties apply equally to the 
color-cut and P(z) methods. The quoted redshift measurement uncertainties apply only to the P(z) method; The color cut method is subject to other, harder to 
quantify systematic uncertainties discussed in the text when not cross-calibrated from the P (z) method. Additionally, systematic uncertainties are quoted for 
all 51 clusters studied with the the color-cut method, and 27 clusters that use the P(z) masses directly. Values in the table are reported to single-digit precision. 



Uncertainty Source 


% of Mean Cluster Mass 




Color-Cut Method P(z) Method 


Shear Measurements 

Multiplicative Shear Bias Cor 
STEP PSF Mismatch 
Coaddition & PSF Interpolation 


3% 
2% 
1% 


Mass Model 

Triaxiality & LOS Structure 
Profile Uncertainty 


3% 4% 
3% 


Photo-z Measurements 

Residual Photometry Systematics 
Simulated Photo-z Bias 
Depth & Filter Mismatch 


3% 
1% 
1% 


Method Cross-Calibration 


4% 


Total Systematic Uncertainty 


7% 7% 
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Figure 13. Comp arison of our mass me asurements to re sults in the literature. Pane l a) shows the comparison to lpkabe et alj feplOl) . panel b) to lMahdavi et alj 
feOOSt) , panel c) to lBardeau etail fe007l) . and panel d) to lPedersen & Dahld l2007t) . For each comparison, we measure the mass within the overdensity radis r& 
of the respective work. The solid line indicates the one-to-one line, the long-dashed line shows the average of the mass ratios, and the dotted line the median. 
For simplicity, the unweighted average is shown, since the measurements are correlated due to overlap in the source galaxy samples. 



(see Paper I and Section|6]in this paper). The significant shear bias 
that we find for fai nter objects (15-30% , Hig.[6} is typical for KSB- 
based algori thms dMassev et alj|2007t) . and thus is likely to affect 
the study of bkabe et alj J201(jh . since a large fraction of the total 
galaxy sample is fainter than our signal-to-noise criterion (Fig.[2j». 

Another difference bet ween the two studie s is the radial range 
over which the profile is fit. lOkabe et aflfcoicl) fit the shear profile 
from the core (1 arcmin, corresponding to 200 kpc at z = 0.2) over 
the entire field of view, i.e. out to ~20 arcmin. For the outer radial 
cut-off, we fit only out to 3 Mpc, which corresponds to ~ 11 — 
15 arcmin for most of the clusters in the comparison set. Numerical 
simulations find that mass estimates based on fitting NF W profiles 
to radii as large as those used in the lOkabe et alj | |2010]) study are 



likely to be biased at the 5-10% level 


Becker & Kravtsovj|201ll: 


lOeuri & Hamandl201ll:lBahe et al.ll2011 


). The bias can be reduced 



or essentially eliminated if the fit range is restricted to within ~ 
2 x r 50 n (about 3 Mpc for the most massive clusters here) as in 
this work. Note, however, that we did not find a significant mass 
offset between fitting to 3 Mpc and fitting to 5 Mpc in our own data 
(Section^. 



Possi ble biases arising d ue to the smaller inner radial cut-off 
chosen bv lOkabe et alj feOld) are more difficult to estimate. Sim- 
ulations do not indicate a sig nificant bias introduced by fitting to 
small scales ( iBahe et alj|201 l|), but observational biases could play 
a much larger role. For example, any residual contamination by 
cluster members would be most pronounced, and most detrimental 
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to the measured shear, at the cluster center. Furthermore, shear test- 
ing programs have not yet investigated the calibration bias for shear 
values typically fou nd at cluster center (g > 0.1). We also note that 
lOkabe et alJfeOlCh fit for the concentration, rather than assuming 
a specific value of mass-concentration relation. Shear values and 
mass measurements are most sensitive to the concentration at small 
radii , hence this may also cause an offset in the mass measurements. 

IZhang et al.l <20ld) compared weak-lensing mas s measure- 
ments for 12 of the clusters in the lOkabe et alj <201Ch sample to 
X-ray hydrostatic mass estimates, determined from XMM-Newton 
data. At r^oo they found no significant bias between the two mass 
measurements. Our analysis suggests that this result should be re- 
considered, and that the true mean ratio of XMM-Newton-based 
hydrostatic mass measurements (used by those authors) to weak- 
lensing mass measurements may be significantly less than unity, of 
the order of ~75%. Such a result may be consistent with the re- 
sults of numerical simulations, which predict that hydrostatic mass 
measurements at large radii should b e biased of the order 10-20% 
due t o non-thermal pressure support dNagaietal.ll2007l :l LauetaU 
12009b . However, we caution that systematic uncertainti es in the 
XMM-Newton calibration remain dNevalainen et alJuOlOO . A com- 
parison of our weak-lensing mass estimates with hydrostatic X-ray 
masses derived from Chandra data, employing the latest calibration 
updates, for the most dynamically relaxed clusters in our sample 
will be presented in a forthcoming paper. 

10.2 Comparison to lMahdavi et alj <2008b and lHoekstral 



4200 



7) 



Mahdavi et al 



Mahdavi et al 



iHoeks tra (2007) and Mah davi et al] J2008b present weak-lensing 
mass measurements for 20 clusters, 8 of whi ch are in common with 
the present study. The mass measur ements of lMahdavi et alj j2008h 
supersede those o f lHoekstralj2007l) : while based on the same shear 
measurements, the deep field redshift and magnitude catalogs used 
to estimate {/?) and (0 1 ) were upda ted in the later work from the 
relatively small Hubble Deep Field fFernandez-Soto et alj 199^) to 
the significantly larger CFHT-LS survey dllbert et alj|2006l) . The 
shear measurements of those studies are based on two-color imag- 
ing data from the CFH12K camera at the CFHT. 

The mass measurements from iMahdavi et ail d2008l) are 10- 
15% lower than ours (Fig. 113b). However, the two sets of mass 
measurements correlate remarkably well. In general , our c olor- 
cut methodology fol lows closely that of IHoekstra! d2007l) and 
the most significant difference being that 
use the aperture-mass method to determine 
the cluster masses, rather than fitting an NFW profile. In the 
aperture-mass method, the mass at a specific radius is determined 
only from galaxies at larger projected radii. Nevertheless, there is 
considerable overlap between the source galaxies used in the two 
studies, since our measurements are based on the radial range of 
0.75-3 Mpc, and the r iQ0 measurements of Mahdavi et alj d2008h 

are of the order of 1-1.5 Mpc. 

The cause for the offset in the 1 Mahdavi et al" I d2008l) masses 
is not entirely clear. Although it is tempting to identify the change 
of the reference deep field as the cause of the offse t - the original 
IHoekstra] d2007l) masses are ~10% higher than the Mahdavi et al. 
( 2008) masses - this is unlikely: the Mahdavi et al. ( 2008) masses 
should be more accurate than those ofHoe kstri] d2007l) because the 
redshift distributions of COSMOS and CFHT-L S have been show n 
to be in excellent agreem ent with each other dllbert et al 
Both lMahdavi et al.l l l2008T) and lHoekstri] d2007l) set a similar limit- 
ing magnitude for the galaxies in their source sample as ours; since 



their exposure times for these clusters are typically about four times 
longer than our observations, but using a telescope of half the di- 
ameter, this should yield a similar minimum signal-to-noise ratio, 
and avoid the larger shear bias of lower-SNR objects. 

A possib l e (par tial) cause for the differenc e between the 
IMahdavi et alj d2008h results and ours is the use in lMahdavi et alj 
d2008h of an NFW profile to correct for the mass-sheet degeneracy, 
as we ll as in converting the m easured 2D mass to a 3D mass esti- 
mate. iMeneghetti et all d20ld) show that the use of aperture-mass 
method with such NFW corrections can lead to a similar bias as 
dire ctly fitting an NFW pr ofile to a large radial range. 

IMahdavi et alj J2OO8I) compare their weak-lensing mass mea- 
surements to X-ray hydrostatic mass estimates based on Chandra 
data. At rsoo, they find that their hydrostatic masses are lower than 
the lensing masses by ~ 20%, for both relaxed and u n-relaxed clus- 
ters. The offset between the IMahdavi et alj d2008h lensing mass 
measurements and our results at first glance implies that the hydro- 
static mass bias at rsoo is larger, of the order of 30%. However, we 
caution that recent updates to the Chandra calibration also imply 
changes in hydrostatic mass measurements at the 10% level. This 
will be addressed in a forthcoming paper comparing weak-lensing 
and hydrostatic mass estimates. 

We note that IVikhlinin et al. I d2009d) used the original 
IHoekstra! d2007h mass measurements of 10 low-redshift clusters 
to verify the sc aling relation between lensing mass and Y x - The 
IHoekstra! d2007h mass measurements agree well on average with 
ours, although we again caution that recent updates to the Chandra 
calibration may imply changes in Y x at the ~10% level. 



10.3 Comparison to lflardeau et alj d2007l) 

iBardeau et al. I d2007h measured weak-lensing masses for eleven 
clusters using three-band imaging with the CFH12K camera at 
the C FH T (some of these dat a were also used in the iHoekstrj 
120071 and IMahdavi et al]|2008l analyses). Seven of these clusters 
are also in our sample. The comparison to our mass measurements 
is somewhat i n conclu sive: while for five of the seven clusters, the 
IBardeau et al. I d2007h masses are lower by ~ 30%, their masses 
for the two most massive clusters in common with our sample are 
~ 25% higher (Fig.[l3j:). 



10.4 



Co mparison tolPedersen & Dahld d2007l) . |Pahld l200fil) 
and lDahleet"ai]d2002l) 



The first large sample of weak l ensing mass measu rements for 
galaxy clusters was compiled by iDahle et al] d2002l) . with a to- 
tal sample size o f 38 clusters. These measurements were used 



by iDahld d2006h to stu dy the cluster mass function, and by 



IPedersen & Dahld i20oj[) to calibrate the scaling relation between 
X-ray temperature and total mass. Their weak lensing observations 
were obtained with the 2.5-m Nordic Optical Telescope and the 2.2- 
m University of Hawaii Telescope. The smaller apertu re, but simi- 
lar exp osure times to the present work, means that the lDahle et alj 
data are significantly shallower than ours (and the other 
works considered here). Furthermore, most of the data were taken 
with single-CCD cameras, restricting the available field of view and 

radial fit range. 

The w eak lensing methodologies o flDahle et alJd2002l) . lDahlel 

d2006h . and |Pedersen & Dahld <2007h are ident ical; here we com- 
pare to the measurements of M soo presented in IPedersen & Daniel 
d2007T) . since our own work focuses in particular on measuring 
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M 500 . There are twelve clusters in common across the two stud- 
ies. The scatter between the two sets of measurements is significant 
(Fig. I13H). although this is partly due to the large statistical un- 
certainties of the shallower data. Furthermore, the overlap in the 
source galaxy catalogs bet ween the two studie s is sm all, since for 
all but two of the clusters, IPedersen & Dahlej j2007t) fit the pro- 
file from imaging with a small field of view, spanning radii of 
only ~0.9-3 arcmin. In comparison, our inner radial cut-off cor- 
responds to 2.5—4 arcmin for these clusters - the measurements are 

thus nearly independent. 

On average, the masses of IPedersen & Dahlel J2007h are 
10-40% higher than ours; it is the only literature sample con- 
sidered here that overestimates the cluster masses compared to 
our measurements. It i s important to keep in mind that the 
IPedersen & Dahlel ilOOH) mass measurements are largely derived 
from the inner cluster regions, which we have explicitly excluded 
in our analysis. As discussed in Sect. [3] the correction for cluster 
galaxy contamination is large in these regions, and the shear mea- 
surement bias for such large shear values remains uncalibrated by 

simulations, leading to larger systematic uncert ainties.^ 

In part, the apparent mass over-estimates of lPedersen & Dahlel 

may also reflect the tendenc y of the shape measurement al- 
gorithm used in | Dahle et al I ( E002h to overestimate the true shear 
dHevmans et al J 20061) . However, over the range of simulated shear 
values, < |y| < 0.1, th e bias is no more than 5%, and indeed 
IPedersen & Dahlel d2007t> find that accounting for it changes the 
mass estimates only by a few percent. 



10.5 Minimizing Observer's Bias 

The next stage of our project will be to calibrate current X-ray mass 
proxies. We emphasize that we have deliberately avoided any com- 
parisons between lensing and X-ray derived masses in this study to 
date. In this particular regard (the X-ray to lensing-mass ra tio), our 
analysis is 'blind' (see the discussion in lAllen et alj|201 lh . To this 
end, all previous lensing efforts by teams affiliated with the authors, 
using the same datasets as this paper, were ignored - including raw 
data reduction. Development of the color-cut and P(z) methods 
proceeded in parallel, and key parts of the algorithm and test sim- 
ulations were independently coded and cross-checked. Only once 
both algorithms were finalized, and all cross-calibration and sys- 
tematic uncertainty analysis was complete, did we compare our 
measurements to lensing masses in the literature, as reported above. 
The X-ray analysis team has as yet had no access to the final lens- 
ing masses reported here (and vice versa), while they independently 
update the X-ray masses with improved instrument calibrations and 
analysis packages. All draft copies of this paper have had the lens- 
ing masses redacted for all coauthors, excluding Applegate and von 
der Linden. The two independent efforts will be combined in sub- 
sequent papers. 

After we compared our lensing masses to literature measure- 
ments, it was noted that the mass for MACS 173 1+22 was mea- 
sured using a coadded image containing some frames with seeing 
smaller than 0.45". We report updated values in the text after fixing 
this oversight. The results did not change appreciably and no con- 
clusions were altered. For reference, the mass for MACS 173 1+22 



6 We emphasize that weak lensing cluster mass measurement efforts must 
be accompanied by simulations well matched to the observations, in order 
to calibrate biases specific to the observational methodology. 



used in the initial literature comparison was X^^lO i4 M e and the 
color-cut to P(z) method calibration was /? = 0.998^q q42. 



11 CONCLUSIONS & OUTLOOK 

We have developed and employed two separate weak-lensing mass 
measurement algorithms to derive accurate lensing masses for a 
sample of 51 massive, X-ray selected clusters. We have used a tra- 
ditional, but improved, "color-cut" analysis to derive masses for the 
entire sample, and a new method incorporating the full individual 
photo-z posterior probability distributions for galaxies in each clus- 
ter field for the 27 clusters observed in at least five filters. We have 
arrived at the following conclusions: 

• The color-cut method, while requiring the least observing 
time, does not easily allow for systematic uncertainties to be quan- 
tified. Systematic uncertainties associated with obtaining appropri- 
ately matched deep fields and other details in the analysis can easily 
shift the mean cluster mass by 5%-10%, depending on the cluster- 
sample redshift range. It is currently unclear whether the color cut 
method can be successfully extended to high redshifts (z > 0.7) 
while simultaneously achieving the 2% systematic uncertainty goal 
required for upcoming surveys. 'Stacking' analyses, where many 
cluster catalogs are combined for joint analysis, does not evade 
these sources of systematic error, though addi tional systematic s 
(e.g., miscentering) may dominate in such cases dRozo et al.l201ll ). 

• The P (z) method, which uses full photo-z posterior probabil- 
ity distributions, is shown to accurately recover the mean cluster 
mass of the present sample. Considering only uncertainties from 
P(z) distributions, we recover the mean cluster mass to better 
than 2% accuracy when using BjVjRc^z* photometry with current 
photo-z codes. This systematic uncertainty is subdominant to cur- 
rent uncertainties associated with the shear calibration, sample size, 
and assumed mass model in the present study. Though requiring 
observations made in more filters than the traditional color-cut ap- 
proach, the benefits of the P (z) method in both accuracy and the 
straightforward quantification of systematic uncertainties are evi- 
dent. 

• Current Photo-z point estimators cannot be used to accu- 
rately measure mean cluster masses over the redshift range ex- 
amined in this study. We find systematic biases in excess of 5% 
using algorithms currently in the literature, with significant red- 
shift dependencies. In contrast, our method, which uses the full 
P(z) information, is accurate to better than 2% for clusters at 
0.15 <z< 0.7. 

• We have used the subsample of clusters with P (z) mass mea- 
surements to calibrate the systematic uncertainties in our color-cut 
method. The agreement in the mean mass for the color cut and 
P(z) analysis codes is excellent, with a mean ratio of 1.00 ± 0.04. 
The statistical uncertainty in the mean ratio is comparable to the 
systematic uncertainties associated with the shear measurements 
and the assumed mass model. The overall systematic uncertainty 
on the mean-mass for the whole sample of 51 clusters is ss 7%. 

• Currently, the dominant systematic uncertainties associated 
with both the shear calibration and the mass model are limited by 
available simulations, not data. New STEP-like simulations, using 
the exact PSFs observed in lensing studies, as well as larger-volume 
dark matter simulations probing the appropriate cluster mass range, 
should reduce these systematic uncertainties to below the 2% level. 
We are working with collaborators to realize these improvements. 

• The excellent performance of the P (z) method offers signif- 
icant promise for extracting cosmological constraints with galaxy 
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clusters in future wide, deep imaging surveys. For clusters in the 
redshift range of interest to these surveys (z < 0.6), our results show 
that it is possible to determine the redshift distributions of lensed 
galaxies witho ut relying on deep spectroscopic surveys for calibra- 
tion (similar to lCunha et al J201 lh . Surveys such as DES and LSST 
will offer precise five or six filter photometry similar to that used 
here, enabling a straightforward application of the P (z) technique. 
However, we caution that the particular filter sets employed by 
these surveys must be calibrated for bias against suitably deep fields 
with precise and accurate photo-z 's calculated from many, e.g. 
30+, filters. As the systematic tolerances push below 2%, a larger 
deep field with many-filter coverage and spectroscopic redshift val- 
idation will be needed to verify photo-z performance. The perfor- 
mance of the P(z) method also suggests that a similar statistical 
approach to cosmic shear may prove fruitful, while deep-field tests 
should provide insight into possible systematic biases. 

• Targeted follow-up of high redshift (z > 0.5) clusters (e.g. SPT 
or eRosita follow-up) should benefit from the addition of near in- 
frared filters, which should in principle allow the extension of the 
P (z) method to higher redshifts. Simulations similar to those pur- 
sued in this study, again utilizing a suitably large and well studied 
deep field, should also be undertaken to determine the bias in the 
cluster masses obtained in this regime. 

The weak-lensing masses reported in this paper have sufficient 
accuracy to realize most of the potential of current X-ray derived 
cluster samples. We emphasize that all measurements reported here 
were derived blindly with respect to X-ray mass measurements, and 
other lensing analyses in the literature. Improved measurements of 
X-ray scaling relations and cosmological parameters using these 
results will be reported in forthcoming papers. 
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